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Abstract 
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'V 

The  use  of  microprogramming  to  improve  the  performance 
of  application  programs  was  investigated.  /  The  application 
programs  used  in  the  study  were  from  various  research  labor¬ 
atories  at  Wright-Patterson  Air  Force  Base,  Ohio. '"  'The  user- 
microprogrammable  Hewlett-Packard  (HP)  21MX  minicomputer  was 
used  for  the  investigation. 

Two  application  programs  were  chosen  as  candidates  for 
microprogramming,  a  wind  tunnel  stress  analysis  program  and  a 
laser  materials  modeling  program.  The  programs  were  analyzed 
to  determine  where  microprogramming  should  be  applied  using 
an  activity  profile  generator  program.  The  microcode  for  the 
programs  was  implemented,  and  the  speed  improvement  measure¬ 
ments  of  the  resultant  programs  were  made. 

The  study  further  looked  at  the  feasibility  of  auto¬ 
mating  the  microprogramming  tuning  process  on  the  HP  21MX 
computer.  Approaches  to  automatically  selecting  program 
segments  for  microprogramming  and  automatically  synthesizing 
the  microcode  were  discussed. 

r- 
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Applications  Directed  Microprogramming  on  a  Minicomputer 

System 


I.  Introduction 


Introduction 

General  purpose  computers  are  by  definition  designed 
to  be  used  for  a  wide  variety  of  applications,  and  thus  are 
very  versatile.  It  is  because  of  this  versatility,  how¬ 
ever,  that  these  computers  are  inherently  ineffecient  for 
many  applications.  The  ideal  situation,  from  a  performance 
point  of  view,  would  be  to  have  a  computer  which  was  de¬ 
signed  specifically  for  each  application.  Since  this  is 
not  realistic,  the  user  must  usually  accept  the  performance 
of  the  general  purpose  computer.  For  most  applications, 
this  s  quite  acceptable. 

Some  applications,  however,  may  have  requirements 
which  exceed  the  capability  of  the  general  purpose  com¬ 
puter.  The  user  may  then  be  forced  to  buy  a  special  pur¬ 
pose  machine  --  a  very  expensive  solution  to  the  perfor¬ 
mance  problem.  If,  however,  the  user's  general  purpose 
machine  is  user-microprogrammable,  another  possible  solu¬ 
tion  exists.  The  user-microprogrammable  computer  can  often 
be  "tuned"  using  microprogramming  to  meet  the  specifica¬ 
tions  of  special  application  programs.  That  is,  special 
instructions  can  be  added  to  the  computer's  instruction  set 
which  will  more  efficiently  perform  the  basic  operations  or 


"primitives"  of  the  application  program.  T.G.  Rauscher 
notes:  "The  efficiency  of  solving  a  particular  problem  de¬ 
pends  primarily  on  the  degree  to  which  the  architecture 
supports  the  problem  primitives"  (Ref.  1:1006). 

It  is  this  use  of  microprogramming  to  improve  the 
performance  of  application  programs  which  is  the  subject  of 
this  thesis  investigation.  This  introductory  chapter  cov¬ 
ers  background  information,  the  specific  problem  investi¬ 
gated,  and  the  approach  taken  to  solve  the  problem. 


Background 

The  origin  of  microprogramming  can  be  traced  back  to 
1951  when  M.V.  WilKes  (Ref.  2)  proposed  using  "micropro- 
grainmes"  as  an  alternative  to  the  "ad  hoc  manner"  in  which 
computer  control  units  were  being  designed.  The  technique 
was  not  widely  used  commercially  until  the  mid  1960s  when 
IBM  introduced  the  microprogrammed  version  of  the 
System/360  (Ref.  3).  Since  then  microprog ramming  has  been 
widely  used  in  the  design  of  computer  control  units. 

The  introduction  of  a  writable  control  store  (WCS), 
that  is  read/write  memory  used  to  store  microprograms ,  made 
it  practical  to  use  microprogramming  to  improve  the  per¬ 
formance  of  application  programs.  Depending  on  the  appli¬ 
cation,  performance  can  mean  such  things  as  speed,  accura¬ 
cy,  or  special  data  formats  (Ref.  4:25).  Speed  is  the 
primary  performance  measure  considered  here.  Properly  ap- 
Pl  ied,  microprogramming  can  increase  the  speed  of  a  program 
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considerably.  Gains  of  six  to  ten  times  or  more  are  pos¬ 
sible  (Ref.  5:98).  Because  of  limited  memory  available  for 
microprograms  and  because  of  the  complexity  of  the  micro¬ 
programming  task,  it  is  not  possible  to  completely  micro¬ 
program  most  application  programs.  Microprogramming  is 
therefore  applied  at  points  in  the  application  program 
where  most  of  the  execution  time  is  spent.  For  a  more  de¬ 
tailed  discussion  of  microprogramming  and  how  it  provides 
improvement  in  program  execution  time,  the  reader  should 
refer  to  Appendix  A.  Appendix  B  contains  a  glossary  of 
terms  used  in  this  report. 

Problem 

The  problem  considered  in  this  thesis  investigation  is 
the  use  of  microprogramming  on  the  user-microprogrammable 
Hewlett-Packard  (HP)  21MX  computer.  This  machine  is  used 
in  several  of  the  laboratories  at  Vvright-Patterson  Air 
Force  Base  ( WPAFB )  for  a  variety  of  specialized  applica¬ 
tions.  Currently,  little  or  no  use  is  made  of  the  micro¬ 
programming  capabilty  of  the  machine. 

Previous  work  at  the  Air  Force  Institute  of  Technolgy 
(AFIT)  on  this  problem  was  done  by  John  J.  Steidle  (Ref. 
6).  Steidle  implemented  the  user-microprogramming  capa¬ 
bility  on  the  AFIT  Digital  Engineering  Laboratory  (DEL)  HP 
21MX  and  began  the  study  of  applying  microprogramming  to 
application  programs.  He  was  able  to  complete  one  micro¬ 
program  --  a  bit-reversal  routine  for  a  Fast  Fourier 
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Transform  (FFT)  program.  This  thesis  effort  is  essentially 
a  continuation  of  his  work. 

Scope 

The  major  objective  of  this  thesis  effort  is  to  pro¬ 
mote  the  use  of  user-microprogramming  by: 

1.  Demonstrating  its  benefits  in  actual  working 
application  programs. 

2.  Investigating  approaches  which  will  aid  other 
users  in  future  microprogramming  tuning  ef¬ 
forts  . 

The  result  of  this  and  future  efforts  will  hopefully  be 
improved  program  performance  and  extension  of  the  useful 
life  of  the  HP  21MX  computer. 

Approach 

The  approach  taken  in  this  thesis  investigation  is 
outlined  in  the  following  steps: 

1.  A  literature  search. 

2.  Identification  of  existing  application  pro¬ 
grams  which  could  benefit  from  microprogram¬ 
ming  . 

3.  Analysis  of  those  programs  to  determine  where 
microprogramming  should  be  applied. 

4.  Design  and  implementet ion  of  the  micropro¬ 
grammed  routines. 

Analysis  of  the  resulting  programs. 


5. 


6.  Investigation  of  approaches  which  would  sim¬ 
plify  the  tuning  process. 

A  literature  search  was  conducted  to  gain  necessary 
background  and  to  learn  what  related  work  had  been  done. 
The  search  revealed  that  research  had  been  done  in  both 
manual  (Refs.  1,4-12)  and  automatic  (Refs.  13-19)  tech¬ 
niques  of  architecture  tuning. 

Identification  of  candidate  programs  was  accomplished 
by  contacting  HP  users  on  base.  An  initial  list  of  users 
was  already  available  (Ref.  6:Appendix  C).  Users  were  in¬ 
terviewed  to  determine  what  application  programs  were 
available  and  which  of  these  would  make  good  candidates  for 
microprogramming.  Source  code  of  selected  programs  was 
then  obtained  for  further  analysis. 

Analysis  of  the  selected  programs  was  done  using  an 
activity  profile  generator  program  (Ref.  6:22).  This  pro¬ 
gram  monitors  the  execution  of  an  application  program,  and 
generates  a  table  and  a  histogram  showing  the  relative  ex¬ 
ecution  times  of  the  various  routines  of  that  program. 

The  analysis  of  the  programs  identified  potential 
routines  for  microprogramming .  The  microroutines  were  then 
designed,  coded  and  substituted  back  into  the  original 
programs.  The  resultant  programs  were  analyzed,  and  the 
execution  times  were  compared  with  tne  execution  times  of 
the  original  programs. 

Based  on  the  experience  gained  through  the  above  man¬ 
ual  tuning  process  and  work  of  others  found  in  the  litera- 
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ture  search,  the  investigation  of  approaches  to  simplify 
the  process  was  begun.  The  goal  of  this  effort  was  to  in¬ 
vestigate  the  feasibility  of  completely  automating  the 
process  and  developing  an  automatic  tuning  system  for  the 
AFIT  HP  21MX  computer.  Each  step  of  the  process  was 
studied  to  determine  if  it  could  be  automated,  and  the 
availability  of  software  and  algorithms  to  support  the 
tuning  step  was  examined. 


Limitations 

This  thesis  investigation  was  limited  in  several  areas 
because  of  various  factors.  The  study  was  confined  to  ap¬ 
plications  of  microprogramming  on  the  HP  21MX,  although  the 
concepts  could  be  applied  to  any  user-microprogrammable 
computer.  The  number  of  application  programs  tuned  was 
limited  by  the  number  of  potential  programs  identified  by 
the  base  users  and  by  the  time  frame  of  the  thesis  effort. 
The  size  of  the  microprograms  was  limited  by  the  size  of 
the  WCS  on  the  AFIT  machine  —  256  words. 


Order  of  Presentation 

This  report  consists  of  seven  chapters.  Chapter  I 
provides  an  intoduction  and  outlines  the  problem  considered 
and  the  approach  taken  to  solve  that  problem.  Chapter  II 


covers  the  survey  of  HP  users  and  the  two  application  pro¬ 
grams  found  as  candidates  for  microprogramming.  The  anal¬ 
ysis  of  the  two  application  programs  is  described  in  Chap- 


ter  II I.  Chapter  IV  describes  the  requirements,  design, 
implementation,  and  test  ot  the  first  microprogram  —  a 
matrix  multiplication  routine  used  for  stress  calculations 
in  a  wind  tunnel  control  program.  The  requirements,  de¬ 
sign,  implementation  and  test  of  the  microcode  for  the 
second  candidate  program  —  a  laser  materials  modeling 
program  —  is  covered  in  Chapter  V.  Chapter  VI  discusses 
the  feasibility  of  designing  an  automated  tuning  system  for 
the  AFIT  HP  21MX  computer.  Chapter  VII  presents  the  re¬ 
sults,  conclusions,  and  recommendations  of  the  thesis  in¬ 


vestigation. 
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Survey  of  HP  Users  at  Wright -Patterson  Air  Force 

Base 


Introduction 

In  order  to  identify  existing  application  programs 
which  might  benefit  from  microprogramming,  a  survey  of  HP 
users  at  Wright-Patterson  Air  Force  Base  was  conducted. 
This  chapter  reports  the  details  of  this  survey  —  the  or¬ 
ganizations  surveyed,  the  criteria  for  choosing  candidate 
programs,  and  the  candidate  programs  chosen  for  further 
analysis . 
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Organizations  Surveyed 

The  survey  was  conducted  through  telephone  contacts 
and  personal  interviews  with  HP  users  whose  names  appeared 
on  an  existing  list  (Ref.  6:  Appendix  C).  An  updated  list 
is  given  in  Appendix  C.  Each  user  contacted  was  given  a 
brief  explanation  of  the  microprogramming  tuning  process 
and  then  asked  if  their  organization  had  any  programs  which 
might  benefit  from  this  process. 

Users  from  eight  separate  organizations  at  Wright- 
Patterson  were  surveyed.  All  of  the  organizations  have  at 
least  one  HP  21MX  computer;  one  organization  has  four.  The 
major  uses  of  the  HP  21MX  differ  widely  among  organiza¬ 
tions,  some  of  the  uses  being:  sensor  modeling,  electronic 
warfare  analysis  and  modeling,  materials  modeling. 


instru- 


ment  data  acquisition  and  processing,  on-line  data  acqui¬ 
sition  of  real-time  telemetry  data,  wind  tunnel  control, 
and  random  vibration  control. 

Because  of  the  diverse  applications  of  the  HP  21MX, 
the  application  programs  of  the  various  organizations  have 
very  little  in  common  at  the  detailed  level.  At  a  more 
general  level,  however,  the  programs  can  be  divided  into 
three  major  categories  —  modeling,  data  acquisition,  and 
control . 

Criteria  for  Candidate  Programs 

Because  of  the  large  number  of  application  programs 
being  run  on  the  HP  21MX,  some  criteria  had  to  be  used  in 
selecting  programs  for  further  analysis  for  microprogram¬ 
ming.  Meyers  (Ref.  20:29)  suggests  three  criteria  for  de¬ 
termining  whether  a  function  should  be  implemented  in  mi¬ 
crocode  or  software:  (1)  "the  function  should  be  small," 
(2)  "the  function  should  be  unlikely  to  change,  and"  (3) 
"system  performance  would  suffer  from  a  slower  software 
implementation  of  the  function. "  Although  Meyers  is  ap¬ 
plying  these  criteria  to  the  design  of  computer  architec¬ 
tures,  they  are  also  very  applicable  to  the  "tuning"  pro¬ 
cess  considered  here,  and  thus  were  used  in  the  program 
selection  process. 

The  requirement  that  the  function  be  small  is  neces¬ 
sary  for  two  reasons.  Writable  control  store  is  very  lim¬ 
ited  on  most  user-microprogrammable  computers.  The  AF1T  HP 
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2 1MX ,  one  of  the  HP  1000  Series  computers,  for  example,  has 
only  256  words.  Table  I  (Ref.  7:15)  shows  the  control 
store  options  available  for  the  HP  1000  Computer  Series. 
Also,  microprogramming  is  inherently  more  difficult  than 
programming  in  a  higher  level  language,  because  of  the 
lower-level  details  the  programmer  must  be  concerned  with, 
such  as  register  and  bus  transfers,  arithmetic  logic  unit 
(ALU)  operations,  and  microinstruction  timing.  For  this 
reason  microprogramming  should  be  held  to  a  minimum. 

The  two  reasons  for  the  first  criterion  also  apply  to 
the  second,  that  "the  function  should  be  unlikely  to 
change. "  Limited  writable  control  store  makes  program 
growth  difficult,  if  not  impossible  in  many  cases.  The 
complexity  of  microprogramming  makes  the  modifications  much 
more  expensive. 

The  third  criterion  points  to  the  program's  need  for 
performance  improvement.  This  may  be  the  most  important  of 
the  three  criteria.  If  a  user  feels  the  performance  of  a 
program  is  already  adequate,  there  is  no  need  to  add  ex¬ 
pense  and  complexity  to  it  by  adding  microcode. 

One  additional  criterion  that  should  be  considered  is 
a  program's  potential  for  improvement  using  microprogram¬ 
ming.  This  potential  is  based  on  the  nature  of  the  pro¬ 
gram.  A  compute-bound  program  is  more  likely  to  be  im¬ 
proved  by  microprogramming  than  an  I/O-bound  program.  A 
plotting  program  that  spends  the  majority  of  its  execution 
time  waiting  Cor  the  mechanical  plotter  would  gain  nothing 
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TABLE  I 

User  Microprogrammability  Provisions  of  HP  1000  Computers 


HP  1000  Computer  Series 

M 

E 

F 

HP  1000  With  4  I/O  channels 

Computer 

Models  With  9  I/O  channels 

2105 

2108 

2109 

2111 

With  14  I/O  channels 

2112 

2113 

2117 

Control  Store  Space 
(micro-instructions ) 

Total  control  store  address  space 

4096 

16384 

16384 

Space  used  by  base  instruction  set 

1024 

1024 

2816 

Space  reserved  for  HP  enhancements 

1536 

3584 

7936 

Space  reserved  for  user  microprograms 

1536 

11776 

5632 

Control  Store  Hardware  for  the  User 

12945A  256-instruction  User  Control 
Store  board  for  user-installed  ROMs 

max. 
of  2 

n/a 

n/a 

13407A  2048-instruction  User  Control 
Store  board  for  user-installed  ROMs 

max. 

of  1 

max. 

of  1 

max. 

of  1 

13197A  1024-instruction  Writable 
Control  Store  board 

max. 
of  2 

max. 
of  3 

max. 
of  3 

in  performance  from  microprogramming. 

Candidate  Programs 

Applying  these  four  criteria  to  programs  examined  in 
the  survey,  two  application  programs  from  two  different 
organizations  were  chosen  for  further  analysis  —  a  wind 
tunnel  stress  control  program  and  a  laser  materials  model¬ 
ing  program.  The  background  and  general  requirements  of 
these  programs  is  dicussed  here. 

Wind  Tunnel  Stress  Control  Program.  The  wind  tunnel 
stress  control  program,  called  STRES,  is  one  of  several 
subroutines  used  to  control  the  overall  operation  of  a  9- 
inch  experimental  wind  tunnel  used  by  one  of  the  Air  Force 
Weapons  Laboratory  (AFWAL)  organizations  —  AFWAL/FIMN 
(Ref.  21).  STRES  is  used  to  calculate  the  stresses  on 
flexible  rods  or  elements  in  the  wind  tunnel. 

A  cross  section  of  the  tunnel  with  the  rods  is  shown 
in  Figure  1  (Ref.  22:Figure  3).  A  total  of  eighteen  rods 
form  the  floor  and  ceiling  of  the  tunnel,  nine  rods  on  each 
surface.  Each  of  the  rods  is  connected  to  ten  electro¬ 
mechanical  jacks,  which  are  used  to  bend  the  flexible  rods 
to  a  desired  shape.  Bending  each  of  the  rods  provides  the 
tunnel  with  variable  geometry  walls,  which  allows  "testing 
larger  models  than  previously  possible  in  a  comparable 
sized  conventional  tunnel"  (Ref.  22:1).  Figure  2  (Ref. 

«  22:Figure  2)  illustrates  the  effect  that  bending  the  rods 

has  on  the  shape  of  the  tunnel. 
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Figure  1 


Wind  Tunnel  Cross  Section 


Wind  Tunnel 


The  rods  are  adjusted  each  time  the  tunnel  is  prepared 
for  a  model  test.  Although  the  rods  are  flexible,  it  is 
important  that  they  are  not  over-stressed  or  they  will  be 
permanantiy  distorted.  The  function  of  STRES  is  to  prevent 
this  from  happening.  STRES  receives  as  one  of  its  input 
parameters  an  array  containing  the  relative  distance  each 
of  the  ten  adjusting  jacks  has  been  moved.  This  informa¬ 
tion  is  used  to  calculate  the  stresses  and  moments  on  each 
rod.  These  values  are  then  passed  to  another  routine  which 
automatically  shuts  down  one  or  more  jacks  if  the  maximum 
allowable  values  are  exceeded. 

Originally  STRES  and  its  associated  subroutines  were 
written  entirely  in  FORTRAN,  but  the  program  was  too  slow 
to  allow  adjusting  of  more  than  one  rod  at  a  time.  Part  of 
one  routine  was  rewritten  in  assembly  language,  and  the 
gain  in  speed  allowed  the  adjusting  of  three  rods  at  a 
time.  This  new  routine  was  called,  quite  appropriately, 
SPEED. 

The  rod  adjustment  process  took  about  5  minutes.  It 
was  hoped  that  by  microprogrammi ng  parts  of  STRES  and 
SPEED,  the  stress  calculations  could  be  made  fast  enough  to 
allow  the  simultaneous  adjustment  of  more  than  three  rods 
--  possibly  two  or  three  times  as  many  —  and  thus  reduce 
the  total  adjustment  time.  The  ultimate  goal  was  to  be 
able  to  adjust  all  eighteen  rods  at  once!  This  would  allow 
real-time  adjustments  during  a  test. 

The  stress  calculation  function  met  all  of  the  program 


selection  criteria,  and  was  considered  a  prime  candidate 
for  microprogramming.  More  conventional  speed-up  tech¬ 
niques  such  as  reverting  to  assembly  language  had  been 
tried.  Microprogramming  was  a  logical  next  step. 

Laser  Materials  Modeling  Program.  The  laser  mate¬ 
rials  modeling  program  is  a  program  which  was  developed  by 
personnel  at  AFWAL/MLPJ  (Ref.  23)  to  model  the  optical 
characteristics  of  laser  materials.  This  program  is  used 
to  calculate  the  real  and  imaginary  parts  of  refractive 
index,  an  important  measure  of  laser  materials.  This  mea¬ 
sure  is  given  by  the  following  equations  (Refs.  23, 


24:1327) : 

n 2  -  k 2  =  c  o  t 
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where : 


n  -  real  part  of  the  refractive  index 
(dimensionless ) 

4  -  imaginary  part  of  the  refractive  index 

(dimensionless ) 

‘  o  =  short-wavelength  dielectric  constant 
(dimensionless) 

o.  -  strength  of  resonance  (dimensionless) 

—  frequency  of  resonance  (cm  1 ) 
v  =  radiation  frequency  (cm-1) 

Y  =  damping  factor  (dimensionless) 
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X 


=  a  weighting  factor  (dimensionless) 

v  =  plasma  frequency  (cm  ^ ) 

P 

v  =  relaxation  frequency  (cm  ; 

The  HP  21MX  at  this  laboratory  is  equipped  with  a  po¬ 
tentiometer  board  consisting  of  32  potentiometers.  Each 
potentiometer  of  the  board  can  be  adjusted  to  supply  dif¬ 
ferent  voltage  levels  to  analog-to-dig ital  (A/D)  convert¬ 
ers.  The  A/D  converters  are  connected  to  the  HP  21MX, 
providing  digitized  inputs  of  the  potentiometer  settings. 
In  the  modeling  program  the  potentiometers  are  used  to 
provide  the  trial  input  parameters  for  the  above  equations. 
Values  for  n  and  k  can  then  be  calculated. 

Another  important  measure  of  laser  materials  is  re¬ 
flectivity.  The  reflectivity  R  at  normal  incidence  is 
given  by  the  following  equation  (Ref.  24:1327): 

(  n  -  I  )  2  i  k  2 

— -  (n,k,  and  R  are  dimensionless) 

( r.  i  1  )  2  i  >  2 

Once  n  and  k  are  known,  R  can  be  calculated  and  displayed 
on  an  oscilloscope  as  a  function  of  wavelength. 

An  experimental  method  of  obtaining  a  plot  of  reflec¬ 
tivity  is  by  using  a  spectrometer  to  measure  a  laser  mate¬ 
rial  sample.  The  coordinate  points  for  this  measured  plot 
can  then  be  input  to  the  modeling  program  and  displayed 
along  with  the  calculated  waveform.  An  example  of  what  a 
display  might  look  like  is  shown  in  Figure  3.  By  adjusting 
the  potentiometers,  the  calculated  plot  can  be  adjusted  to 
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closely  match  the  measured  waveform  and  the  corresponding 
refractive  index  spectra  can  be  calculated.  Thus,  the 
program  functions  as  a  curve-fitting  tool. 

It  should  be  made  clear  that  there  are  other  comput¬ 
erized  techniques  for  calculating  the  refractive  index,  and 
that  this  technique  is  not  meant  to  replace  them.  This 
experimental  method  has  three  objectives  (Ref.  23):  (1)  to 
calculate  the  refractive  index  of  a  sample  material,  (2)  to 
give  a  researcher  a  "feel"  for  the  effects  the  different 
parameters  have  on  the  spectra,  and  (3)  to  possibly  aid  in 
the  synthesis  of  new  laser  materials. 

Although  this  modeling  program  had  not  yet  been  tested 
at  the  time  of  the  survey,  it  was  anticipated  that  the 
FORTRAN  routine  used  to  calculate  the  refractive  index 
equations  would  not  be  fast  enough  to  provide  an  acceptable 
oscilloscope  display.  This  was  the  motivation  behind  the 
application  of  microprogramming  to  this  routine. 

The  routine  met  all  of  the  program  selection  criteria. 
The  function  was  small.  The  refractive  index  equations 
were  well  established,  so  there  was  little  possibility  of 
modification.  The  function  should  execute  faster  in  mi¬ 
crocode.  Since  the  program  had  not  yet  been  tested,  there 
was  some  question  as  to  whether  or  not  the  function  would 
execute  fast  enough  in  FORTRAN.  The  designer  of  the  pro¬ 
gram  had  had  experience  with  similar  FORTRAN  routines  run¬ 
ning  on  the  HP  21MX,  and  felt  confident  that  this  routine 
would  not  be  fast  enough  to  provide  an  acceptable  oscil Id- 
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scope  display  if  done  in  FORTRAN.  Later  tests  confirmed 
that  he  was  correct.  Cosidering  these  factors,  the  program 
was  a  good  candidate  for  microprogramming. 

Conclusions 

The  results  of  this  survey  were  somewhat  unexpected, 
since  only  two  application  programs  were  selected  as  can¬ 
didates  for  microprogramming.  Several  reasons  for  this  can 
be  given: 

(1)  Many  programs  did  not  meet  the  selection  criteria. 

(2)  A  Control  Data  Corporation  (CDC)  system  is  avail¬ 
able  to  most  organizations  at  Wright-Patterson  Air 
Force  Base  and  can  be  used  for  programs  which  ex¬ 
ceed  the  capability  of  the  HP  computer. 

(3)  One  of  the  organizations  (Ref.  25)  had  potential 
programs,  but  administrative  control  of  the  com¬ 
puters  was  being  transferred  to  another  organiza¬ 
tion  . 

(4)  One  person  interviewed  (Ref.  26)  had  several  ideas 
for  microprogramming  applications,  but  the  pro¬ 
grams  had  either  not  been  written,  or  were  running 
on  the  CDC  system. 

(5)  Some  persons  may  have  been  hesitant  to  involve 
themselves  or  their  programs  in  this  project. 

Although  only  two  application  programs  were  identi¬ 
fied,  the  survey  was  considered  successful.  The  two  pro¬ 
grams  were  "real"  operational  programs  and  were  good  can- 


didates  for  microprogramming.  Also,  the  thinking  of  many 
of  the  HP  users  was  hopefully  stimulated  toward  the  use  of 
microprogramming  in  future  applications. 
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Analysis  of  Candidate  Programs 


Introduction 


As  stated  in  the  first  chapter,  it  is  not  possible  or 
even  desirable  to  completely  microprogram  most  application 
programs,  because  of  limited  writable  control  store  and  the 
complexity  of  the  microprogramming  task.  Fortunately,  mi¬ 
crocoding  an  entire  program  is  not  necessary  to  increase 
its  speed.  One  study  of  FORTRAN  programs  (Ref.  28)  has 
shown  that  more  than  80  per  cent  of  the  total  execution 
time  of  a  program  is  concentrated  in  at  most  four  to  five 
per  cent  of  the  instructions.  Careful  analysis  of  a  pro¬ 
gram  can  reveal  these  areas  of  high  concentration. 

This  chapter  covers  some  of  the  analysis  techniques 
and  the  application  of  one  these  techniques  to  the  two  mi¬ 
croprogramming  candidates  the  wind  tunnel  stress  program 
and  the  laser  materials  modeling  program. 

Analysis  Techniques 

Several  techniques  are  available  for  determining  where 
most  of  the  execution  time  is  spent  in  a  program  (Ref. 


4:146)  : 


1)  Static  instruction  analysis 

2)  Timing  calls 

3)  Logic  analyzer 


4)  Activity  profile  generator 
Static  instruction  analysis  involves  the  addition  of 
the  execution  times  of  individual  instructions  in  a  program 
to  determine  the  total  execution  times  of  the  various  seg¬ 
ments  of  the  program.  This  method  can  provide  good  results 
if  the  execution  is  not  data-dependent .  Detailed  knowledge 
of  the  individual  instruction  execution  times  is  required, 
however,  and  the  process  is  very  tedious  if  done  manually. 

Timing  calls  to  the  system  clock  can  be  used  to  de¬ 
termine  the  elapsed  time  between  the  beginning  and  end  of  a 
program  segment.  This  technique  requires  a  high-resolution 
system  clock.  Erroneous  results  may  be  obtained  in  a  mul¬ 
titasking  system.  Also,  there  is  much  guesswork  involved 
in  the  placement  of  the  timing  calls  within  the  program. 

A  logic  analyzer  can  be  used  to  monitor  memory  acces¬ 
ses.  If  the  absolute  addresses  of  the  program  segments  are 
known,  the  logic  analyzer  can  be  programmed  to  monitor 
those  addresses,  and  the  frequency  of  a  segment's  execution 
can  be  determined.  This  is  an  good  technique,  because  it 
gives  very  precise  measurements  without  any  interference 
with  the  operation  of  the  computer.  It  does  require  the 
added  hardware  of  the  logic  analyzer,  and  like  the  timing 
call  technique,  involves  much  guesswork  in  determining  the 
program  addresses  to  monitor. 

The  activity  profile  generator  is  a  program  which  runs 
in  a  multitasking  system  along  with  the  program  under  test. 
The  profile  generator  uses  an  external  interrupt,  such  as 
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the 

system 

clock,  to  interrupt 

the  program 

under 

test,  and 

the 

point 

of  inter rruption  is 

recorded . 

These 

recorded 

points  of  interruption  can  then  be  used  to  generate  a  table 
or  histogram  of  the  program's  activity,  showing  immediately 
the  most  active  segments  of  the  program.  This  technique 
has  the  advantage  of  easy  implementation  on  most  systems. 
It  does  introduce  some  overhead,  however,  because  of  the 
frequent  interruption  of  the  program  under  test,  but  the 
results  of  the  profile  generation  are  not  affected.  Also, 
because  the  profile  generator  runs  in  a  multitasking  sys¬ 
tem,  erroneous  results  may  be  obtained  as  in  the  timing 
call  method.  This  problem  can  be  solved  by  the  correct 
setting  of  program  execution  priorities. 

One  additional  technique  that  has  been  used  (Refs.  17, 
29)  is  a  microprogrammed  version  of  the  activity  profile 
generator.  This  technique  requires  modifications  to  the 
microprogrammed  instruction  fetch  routine  to  gather  the 
statistics  on  instructions  as  they  are  fetched  from  memory 
for  execution.  Since  the  fetch  routine  "sees"  each  in¬ 
struction,  a  detailed  execution  profiLe  can  be  made.  The 
detail  can  be  down  to  the  number  of  times  each  program  in¬ 
struction  is  executed,  if  desired.  Some  overhead  is  in¬ 
troduced,  since  the  instruction  fetch  time  is  increased. 
This  technique  is  difficult  to  implement  on  most  commercial 
machines,  since  it  requires  modification  of  the  instruction 
fetch  routine  in  control  store  ROM.  Also,  some  provision 
is  needed  for  turning  the  profile  off  under  normal  opera- 
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tion  of  the  machine. 


Activity  Profile  Generator  Program 

Of  the  analysis  techniques  discussed,  the  activity 
profile  generator  program  was  chosen  for  this  study. 
Static  instruction  analysis  was  ruled  out  because  it  would 
have  had  to  be  done  manually.  The  timing  call  technique 
would  have  involved  too  much  guesswork,  especially  in  the 
unfamiliar  application  programs  being  analyzed.  A  logic 
analyzer  was  not  available,  so  was  not  seriously  consid¬ 
ered.  Modification  of  the  instruction  fetch  routine  on  a 
commercial  machine  is  something  which  should  be  done  by  the 
manufacturer  rather  than  the  user.  Because  of  these  rea¬ 
sons,  the  activity  profile  generator  program  was  considered 

rr- 

the  best  choice.  Also,  one  such  program  called  ACTV  was 
available,  and  had  successfully  been  used  before  (Ref. 
6:24)  . 

A  listing  of  the  program  ACTV  and  its  two  subroutines 
is  given  in  Appendix  D.  The  subroutine  called  IDGET  was 
added  to  the  original  program  to  allow  ACTV  to  run  on 
AFIT's  RTE-III  (Real-Time  Executive-Ill)  operating  system. 
This  routine  is  available  as  a  system  library  routine  on 
RTE-IV  systems.  Instructions  for  running  ACTV  under  RTE- 
III  are  given  in  Appendix  E. 

ACTV  monitors  a  program's  activity  by  periodically 
interrupting  the  program,  using  the  system  clock  as  the 
source  of  the  interrupt.  The  interrupt  rate  in  increments 
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of  ten  milliseconds  is  interactively  input  by  the  user. 
The  user  also  provides  an  upper  and  lower  bound  on  the 
memory  addresses  of  interest  based  on  the  load  address  of 
the  program.  This  area  of  interest  is  divided  into  50  in¬ 
tervals,  and  a  counter  is  provided  for  each  interval  in  the 
form  of  an  array.  Two  additional  counters  are  used  —  one 
for  addresses  below  the  area  of  interest  and  one  for  ad¬ 
dresses  above.  The  address  at  which  the  program  is  sus¬ 
pended  at  the  time  of  interruption  determines  which  counter 
is  incremented.  The  number  of  counts  or  "hits"  in  each 
address  interval  can  then  be  printed  in  the  form  of  a  table 
and  a  histogram  providing  the  activity  profile  for  the 
program. 

Wind  Tunnel  Stress  Program  Analysis 

The  analysis  of  the  wind  tunnel  stress  calculation 
program  STRES  and  its  subroutine  SPEED  was  performed  using 
the  profile  data  from  several  computer  runs  of  ACTV.  In 
order  to  run  STRES  on  the  AFIT  HP  system,  a  special  test 
driver  program  called  SDRVR  was  obtained  from  AFWAL/FIMN. 
This  program  provided  the  needed  input  parameters  to  STRES 
which  normally  originate  from  special  hardware  of  the  wind 
tunnel  control  system.  For  the  analysis,  SDRVR  was  used  as 
the  calling  program  for  STRES,  which  in  turn  called  SPEED. 
Neither  STRES  nor  SPEED  were  modified  for  the  analysis. 
Listings  for  the  three  programs  are  provided  in  Appendix  F. 

Tables  II,  lit,  and  IV  show  the  resulting  activity 


profile  tables  for  three  of  the  ACTV  runs. 


AC TV  also  out- 


puts  histograms  (corresponding  to  the  tables,  but  these  were 
found  to  be  of  little  value  for  the  analysis,  and  are  not 
shown.  Table  V  is  a  load  map  showing  the  absolute  memory 
addresses  of  3DRVR,  STRES,  and  SPEED.  The  addresses  from 
the  load  map  are  used  to  correlate  the  address  intervals  on 
the  ACTV  tables  to  the  programs. 

The  first  ACTV  run  looked  at  the  memory  addresses  from 


40531 

(all 

addresses  in 

octal ) 

to 

43126,  which 

included 

SDRVR, 

STRES, 

and  SPEED. 

Table 

II 

shows  that  10 

of  the 

14 

total 

"hi ts " 

or  71  per 

cent  of 

the  activity  occurred 

in 

SPEED. 

The  next  run  looked  at  addresses  42604  to  43126,  the 
range  of  SPEED.  Table  III  shows  that  8  of  the  10  "hits"  in 
SPEED  occurred  in  the  address  range  of  42666  to  42705. 

To  further  "home"  in  on  the  "hot"  spot,  a  third  run 
was  made  looking  at  the  interval  from  42661  to  43020.  Ta¬ 
ble  IV  shows  the  8  "hits"  occurred  in  the  range  of  42671  to 
42703  or  locations  65  to  77  of  SPEED. 

These  locations  correspond  very  closely  to  the  range 
of  LOOPl  in  SPEED.  With  80  per  cent  of  the  activity  of 
SPEED  occurring  in  LOOPl,  the  obvious  conclusion  to  be 
drawn  is  that  any  microprogramming  applied  to  SPEED  should 
include  the  LOOPl  program  segment. 

Laser  Materials  Modeling  Program  Analysis 

The  analysis  of  the  laser  materials  modeling  program 
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TABLE  V 


nr 


Load  Map  for  SDRVR,  STRES,  and  SPEED 


COM 

40002 

40530 

SDRVR 

40531 

42110 

STRES 

42111 

42603 

SPEED 

42604 

43126 

.  .MAP 

43127 

43222 

751101 

24998-16001 

CLRIO 

43223 

43231 

750701 

24998-16001 

ENTRY  POINTS 


*3DRVR 

41673 

*  .  DLD 

104200 

* .  DST 

104400 

*. .MAP 

43127 

*EXEC 

12446 

*CLR IO 

43223 

*STRES 

42244 

* .  FMP 

105040 

* . ENTR 

37201 

* FLOAT 

105120 

*SPEED 

42611 

subroutine  called  CALC  followed  the  scheme  that  was  used  in 
the  analysis  of  the  wind  tunnel  program.  A  special  driver 
program  called  CDRVR  was  required  to  provide  the  needed 
input  parameters  for  the  routine  CALC.  Listings  for  the 
two  FORTRAN  programs,  CDRVR  and  CALC,  are  in  Appendix  I. 

As  with  the  previous  program,  three  activity  profiles 
were  run.  Tables  VI,  VII,  and  VIII  show  the  resulting  ac¬ 
tivity  profile  tables,  and  Table  IX  shows  the  load  map  for 
CDRVR  and  CALC. 

The  first  run  included  both  CDRVR  and  CALC  in  the  area 
of  interest,  from  location  40002  to  40651.  Table  VI  shows 
a  large  concentration  of  "hits"  in  the  address  range  from 
40233  to  40354,  somewhere  in  the  middle  of  CALC.  The  num¬ 
ber  of  "hits"  in  this  area  is  33  of  the  total  48,  or  69  per 
cent. 

The  second  run  included  CALC  only  from  location  40142 
to  40651.  Table  VII  shows  two  areas  of  relatively  high 
"hit"  concentration.  One  large  area  from  40266  to  40347 
contains  23  "hits"  or  48  per  cent  of  the  total.  The  other 
smaller  area  from  40223  to  40250  has  8  "hits"  or  17  per 
cent  of  the  total. 

The  third  run  covered  these  two  areas  more  closely 
from  40214  to  40365.  Table  VIII  still  shows  the  two  areas 
of  concentration,  but  within  those  areas,  the  "hits"  are 
fairly  evenly  distributed.  One  exception  is  the  interval 
from  40332  to  40335,  which  has  6  "hits",  a  relatively  high 
•  concentration.  A  mixed  FORTRAN/assembly  listing  shows  that 
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TABLE  IX 


Load  Map  for  CDRVR  and  CALC 


CDRVR 

40002 

40141 

CALC 

40142 

40651 

ERRO 

40652 

40760 

750701 

24998-16001 

SQRT 

40761 

41107 

751101 

24998-16001 

•  OPSY 

41110 

41147 

750701 

24998-16001 

CLRIO 

41150 

41156 

750701 

24998-16001 

.  .  FCM 

41157 

41173 

750701 

24998-16001 

REIO 

41174 

41276 

92001- 

16005  741120 

ERO.  E 

41277 

41277 

750701 

24998-16001 

.  PWR2 

41300 

41332 

750701 

24998-16001 

.  DFER 

41333 

41404 

750701 

24998-16001 

ENTRY  POINTS 


*CDRVR 

40076 

*EXEC 

12446 

*CLRIO 

41150 

*CALC 

40147 

* .  FMP 

105040 

* .  FDV 

105060 

* .  FAD 

105000 

* .  FSB 

105020 

*. .FCM 

41157 

* .  MPY 

100200 

* .  DLD 

104200 

* .  DST 

104400 

* . ENTR 

37201 

*SQRT 

40771 

* FLOAT 

105120 

*ERRO 

40652 

*REIO 

41200 

*ERO . E 

41277 

* . OPSY 

41110 

* . PWR2 

41300 

* . ZRNT 

02001 

*. ZPRV 

02001 

* . DFER 

41333 

*$LIBR 

12665 

*$LIBX 

13463 
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this  interval  immediately  follows  a  floating  point  divide 
instruction,  which  is  the  slowest  of  all  the  floating  point 
instructions.  This  accounts  for  the  high  concentration  at 
that  point. 

Close  examination  of  the  CALC  listing  shows  that  all 
the  "hits"  in  the  specified  range  in  Table  VIII  occur  in 
the  "DO"  loop  of  the  program.  Furthermore,  the  "hits"  are 
concentrated  in  the  area  of  the  loop  where  many  floating 
point  operations  take  place.  The  microprogramming  effort 
in  CALC  should  be  concentrated  on  this  loop,  microcoding 
the  entire  loop  if  possible. 

Summary 

This  chapter  covered  the  analysis  techniques  for 
finding  the  time-consuming  areas  of  a  higher  level  or  as¬ 
sembly  language  program.  Two  programs  were  analyzed  in 
detail  using  an  activity  profile  generator  program.  The 
first  program,  the  wind  tunnel  stress  routine,  showed  a 
high  activity  concentration  in  a  very  small  loop.  The 
second  program,  the  laser  materials  modeling  routine,  had 
its  area  of  high  activity  also  in  a  loop.  The  loop  of  the 
second  program,  however,  was  much  larger,  containing  sev¬ 
eral  lengthy  FORTRAN  statements.  Both  of  the  loops  of  the 
two  programs  could  be  microcoded. 
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IV.  Requirements /  Design,  Implementation  and  Test  of  a 

Microprogram  for  the  Wind  Tunnel  Control  Program 

Introduction 

The  previous  chapter  covered  the  analysis  of  the 
stress  calculation  routine  (STRES)  for  the  wind  tunnel 
control  program.  This  analysis  showed  that  about  80  per 
cent  of  the  program  activity  in  the  assembly  language  sub¬ 
routine  of  STRES  called  SPEED  occurred  in  one  loop  segment 
of  SPEED.  This  chapter  covers  the  detailed  requirements, 
design,  implementation  and  test  of  a  microprogram  called 

LOADS  to  replace  this  loop  segment  in  SPEED. 

LOADS  Requirements 

LOADS  is  the  microprogram  designed  to  replace  the  loop 
segment  labeled  LOOP2  in  SPEED.  LOOPl  is  actually  the  loop 
in  which  most  of  the  activity  was  found  to  occur,  but  it  is 
nested  within  LOOP2.  Because  of  this  relationship  between 
the  two  loops,  it  was  decided  to  include  LOOP2  in  the  de¬ 
sign  of  LOADS. 

First  and  second  level  data  flow  diagrams  ( DFDs )  (Ref. 

30:Chapt.  4)  of  SPEED  are  given  in  Figure  4  to  show  the 

relationship  of  L00P2  to  the  rest  of  SPEED.  As  shown  in 

the  diagrams,  SPEED  performs  three  major  functions  —  com¬ 
putation  of  loads  (forces),  moments,  and  stresses.  L00P2, 
along  with  its  inner  loop  labeled  LOOPl,  performs  the  load 


44 


Figure  4.  Level  1  and  2  DFDs  for  Subroutine  SPEED 


Figure  5.  DFD  for  LOADS  Microprogram 
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computation  function.  The  purpose  of  this  function  is  to 
calculate  the  loads  on  the  individual  flexible  rods  of  the 


wind  tunnel.  Calculation  of  the  loads  is  an  interim  cal¬ 
culation  to  computing  the  moments  and  stresses  on  each  rod, 
as  shown  in  the  second  level  DFD  of  Figure  4.  The  calcu¬ 
lation  of  the  loads  consists  of  a  simple  matrix  multipli¬ 
cation. 

Matrix  multiplication  is  defined  as  follows  (Ref. 
31:  343  ):  Given  A=(a^),  an  in  x  n  matrix  and  B~(b^.),  an 
n  x  p  matrix.  Then  the  product  AB  is  a  matrix  C=(c^j) 
where : 


n 

ij  ~~  j^^aik^iic 


and  the  matrix  C  is  of  order  m  x  p.  The  two  matrices  in¬ 
volved  in  the  load  calculation  are  called  DDFL  and  YZT. 

DUEL  is  a  13  by  13  matrix,  which  represents  the  in¬ 
verse  matrix  of  the  deflections  of  a  single  wind  tunnel  rod 
due  to  a  unit  load  applied  to  the  rod  at  one  point.  There 
are  13  jacks  attached  to  each  rod,  giving  13  deflection 
points  and  13  points  to  apply  a  unit  load. 

YZT  is  a  14  by  1  matrix  which  represents  the  deflec¬ 
tions  in  inches  from  the  neutral  position  of  the  13  jacks 
attached  to  each  rod.  YZT(l)  represents  the  point  at  which 
a  rod  is  attached  to  the  tunnel  wall,  and  thus  always  has  a 
deflection  of  zero.  YZT(2)  through  YZT(4)  are  the  deflec¬ 
tions  of  the  manual  jacks  used  to  position  each  rod . 
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YZT ( 5 )  through  YZT(14)  are  the  deflections  of  the  ten 
electric  jacks  used  for  the  same  purpose.  The  values  for 
the  electric  jack  deflections  are  actually  the  periodic 
readings  from  potentiometers  attached  to  each  rod  (one  po¬ 
tentiometer  per  rod).  The  first  element  of  YZT,  YZT(l),  is 
not  used  in  the  matrix  multiplaction.  This  makes  the  YZT 
matrix  effectively  a  13  by  1  matrix,  satisfying  the  dimen¬ 
sion  requirements  for  matrix  multiplication. 

The  result  of  the  multiplication  of  DDFL  and  YZT  is  a 
13  by  1  matrix  called  LOAD.  LOAD  is  actually  a  14  by  1 
matrix  like  YZT  with  the  first  element  set  to  zero,  and  the 
remaining  13  elements  are  the  result  of  the  matrix  multi¬ 
plication.  The  reason  the  YZT  and  LOAD  matrices  have  an 
extra  element  is  because  of  requirements  in  other  routines 
of  the  wind  tunnel  control  program.  For  the  purpose  of  the 
matrix  multiplication,  they  are  13  by  1  matrices  and  will 
be  referred  to  as  such. 

The  data  flow  for  the  matrix  multiplication  of  DDFL 
and  YZT  is  shown  in  the  DFD  of  Figure  5.  This  DFD  is  a 
further  breakdown  of  the  "COMPUTE  LOADS"  "bubble"  of  the 
DFD  of  Figure  4,  The  data  flow  here  is  very  simple.  Ele¬ 
ments  of  DDFL  and  YZT  are  obtained  from  their  respective 
matrices.  The  two  elements  are  multiplied,  and  the  product 
is  added  to  the  appropriate  LOAD  element.  The  control  in¬ 
volved  in  this  process  cannot  be  shown  in  a  DFD,  but  is 
described  in  the  following  structured  English  (Ref. 
30:Chapt.  6)  requirements  specification  of  the  DDFL  and  YZT 
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matrix  multiplication  routine: 


1 


l 


t 


REPEAT  UNTIL,  NO  MORE  ROWS  IN  THE  DUEL  MATRIX 
SET  NEXT  ELEMENT  OP  LOAD  MATRIX  TO  0 
REPEAT  UNTIL  NO  MORE  COLUMNS  IN  THE  DDFL 
MATRIX 

MULTIPLY  NEXT  DDFL  AND  YZT  ELEMENTS 
ADD  THE  PRODUCT  TO  THE  CURRENT  LOAD 
ELEMENT 

POINT  TO  THE  NEXT  YZT  ELEMENT 
POINT  TO  THE  NEXT  DDFL  COLUMN 

END 

POINT  TO  THE  NEXT  LOAD  ELEMENT 
POINT  TO  THE  NEXT  DDFL  ROW 


END 


The  matrix  multiplication  consists  of  two  loops,  one  within 
the  other,  as  shown  in  the  above  structured  English  speci¬ 
fication.  The  inner  loop  corresponds  to  LOOPl  in  SPEED, 
and  the  outer  loop  corresponds  to  LOOP2. 


B  'r 


N 


?-■ 

y 

[• 


Design  of  LOADS 

The  basic  design  of  the  LOADS  microprogram  is  shown  in 
Figure  6  in  the  form  of  a  structure  chart  (Ref.  30:Chapt. 
7).  This  chart  is  the  result  of  transform  analysis  (Ref. 
30 : chapt .  9),  which  is  a  design  technique  that  builds  a 
system  around  the  concept  of  data  transformation.  In  the 
case  of  LOADS,  the  data  elements  of  the  DDFL  and  YZT  ma¬ 
trices  are  transformed  into  an  element  of  the  LOAD  matrix. 
This  transformation  is  shown  in  the  DFD  of  LOOPl,  from 
which  the  structure  chart  is  drawn.  The  reader  should  note 
that  the  data  names  on  the  structure  chart  are  for  design 
purposes  only  and  do  not  actually  exist  in  the  microcode. 
Data  at  the  micro-level  exists  in  registers,  and  the  ap- 
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plicable  registers  used  for  temporary  storage  are  shown  in 
parenthesis  below  the  corresponding  data  name. 

If  LOADS  was  to  be  implemented  in  a  higher  level  lan¬ 
guage,  the  design  of  the  routine  would  essentially  be  done. 
It  is  a  simple  process  to  code  a  FORTRAN  or  PASCAL  program 
from  the  above  structured  English  using  the  modular  design 
of  the  structure  chart.  Implementing  the  routine  in  mi¬ 
crocode,  however,  requires  the  design  to  go  to  an  even 
lower  level.  The  following  algorithmic  steps  take  the  de¬ 
sign  to  a  sufficiently  low  level  to  write  the  microcode: 

1.  Read  calling  parameters  —  addresses  for  DDFL, 

YZT,  and  LOAD  matrices  —  from  memory  ar  1  store 
into  their  respective  scratch  registers. 

2.  Store  an  outer  loop  count  of  13  into  a  loop 
counter  register. 

3.  Store  an  inner  loop  count  of  13  into  a  loop 
counter  register. 

4.  Set  the  current  LOAD  matrix  element  to  zero. 

5.  Read  the  current  DDFL  matrix  element  from  memory 
into  the  A/B  registers. 

6.  Read  the  current  YZT  matrix  element  from  memory. 

7.  Call  the  floating  point  multiply  routine. 

8.  Read  the  current  LOAD  matrix  element  from  memory. 

9.  Call  the  floating  point  add  routine. 

10.  Store  the  result  into  the  LOAD  matrix  element. 

11.  Increment  the  DDFL  address  register. 

12.  Increment  the  YZT  address  register. 

13.  Decrement  the  inner  loop  counter  register. 

14.  If  the  counter  does  not  equal  zero,  go  back  to 
step  5. 

15.  Increment  the  LOAD  address  register. 

16.  Decrement  the  outer  loop  counter  register. 

17.  If  the  counter  does  not  equal  zero,  go  back  to 
step  4. 

18.  Return  to  the  calling  assembly  language  routine. 


Implementation  of  LOADS 

The  process  of  implementing  the  design  of  LOADS  in 
microcode  is  straightforward.  Use  of  the  HP  microassembler 
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(Ref.  32:5-1)  makes  the  coding  very  similar  to  coding  in 
assembly  language.  The  resultant  microprogram  is  listed  in 
Appendix  G  along  with  the  modified  version  of  SPEED  called 
M3PED,  required  to  invoke  the  microprogram. 

Some  of  the  limitations  of  the  HP  architecture  have  a 
significant  effect  on  the  microcode.  Because  only  one 
register  is  available  for  subroutine  return  addresses  in 
the  HP  21MX  M-Series,  subroutine  calls  cannot  be  made  from 
other  subroutines  without  losing  the  original  return  ad¬ 
dress.  This  is  a  significant  problem  when  using  control 
store  ROM  routines  such  as  the  floating  point  multiply  and 
add  routines  required  by  LOADS.  These  routines  call  other 
ROM  routines,  so  they  cannot  be  used  directly.  One  way 
around  this  problem  is  to  duplicate  the  routines  in  WCS  and 
"jump"  directly  to  and  from  these  routines.  Duplicating 
the  routines  is  no  problem  as  all  the  ROM  routines  are 
documented  (Ref.  32: Appendix  E),  but  they  do  use  much 
valuable  WCS  space. 

Another  problem  with  using  the  ROM  routines  is  that 
they  use  many  of  the  available  scratch  registers.  For  ex¬ 
ample,  the  floating  add  and  multiply  routines  and  their 
associated  subroutines  use  ten  of  the  twelve  available 
scratch  registers.  Scratch  registers  are  very  important 
since  data  cannot  be  stored  in  WCS  in  the  HP  21MX.  With 
the  routines  in  WCS,  the  register  usage  can  be  reorganized 
somewhat.  Doing  this  in  LOADS  freed  two  more  registers. 
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The  plan  for  testing  the  LOADS  microprogram  consisted 
of  two  major  phases  —  a  module  test  and  a  system  test. 
The  module  test  was  conducted  on  the  AFIT  HP  system  using 
the  special  driver  program  3DRVR  to  drive  LOADS  (via  STRES 
and  SPEED) .  The  purpose  of  this  test  phase  was  to  show 
that  LOADS  would  produce  the  same  output  as  the  assembly 
language  code  segment  replaced  by  LOADS.  The  system  test 
was  run  on  the  AFWAL  wind  tunnel  control  computer.  The 
purpose  of  this  test  phase  was  to  show  that  the  LOADS  mi¬ 
croprogram  would  load  and  execute  correctly  on  the  system 
for  which  it  was  designed. 

Module  Test.  The  module  test  plan  consisted  of  two 
parts:  (1)  verification  of  the  program  output,  and  (2)  de¬ 
termination  of  the  speed  improvement  of  the  microprogram. 
Imbedded  in  the  data  and  assignment  statements  of  SDRVR 
were  known  inputs  for  the  subroutine  STRES,  which  would 
produce  known  stress  and  moment  calculation  outputs.  The 
goal  of  this  part  of  the  module  test  was  to  duplicate  those 
outputs  using  the  microprogrammed  version  of  the  program. 
HP's  Micro  Debug  Editor  (MDE)  (Ref.  32:5-21)  was  used  for 
loading  the  LOADS  microprogram  into  WCS,  and  for  debugging 
the  microprogram.  Debugging  consisted  of  setting  break¬ 
points  within  the  microprogram  and  analyzing  register  con¬ 
tents  when  the  breakpoints  were  reached. 

Determination  of  the  speed  improvement  of  STRES  and 
SPEED  through  the  use  of  the  LOADS  microprogram  was  accom- 


plished  through  executive  calls  to  read  the  system  clock. 
It  was  necessary  to  call  STRES  (and  therefore  SPEED)  100 
times  from  a  loop  in  order  to  get  accurate  measurements. 
The  timing  tests  showed  that  the  microprogrammed  version  of 
SPEED  (using  LOADS)  was  36  per  cent  faster  than  the  orig¬ 
inal  version.  Since  the  loop  replaced  by  LOADS  represented 
80  per  cent  of  the  total  execution  time  of  SPEED,  LOADS  was 
actually  45  per  cent  faster  than  the  loop  it  replaced.  The 
speed  increase  in  SPEED  made  its  calling  program  STRES  31 
per  cent  faster. 

The  45  per  cent  speed  increase  (almost  two  times  as 
fast)  is  somewhat  less  than  the  gains  of  six  to  ten  times 
(Ref.  5:98)  or  two  to  twenty  times  (Ref.  9:49)  reported  in 
the  literature.  Close  analysis  of  the  assembly  language 
for  the  loop  explains  the  difference.  Totaling  the  in¬ 
struction  times  for  floating  point  and  non-floating  point 
instructions  shows  that  46  per  cent  of  the  loop's  execution 
time  is  spent  in  the  two  floating  point  instructions. 
Since  the  microcoded  version  of  the  program  uses  these  same 
floating  point  routines,  no  speed  improvement  can  be  made 
to  46  per  cent  of  the  loop,  and  the  best  possible  speed 
improvement  to  the  loop  is  54  per  cent.  A  45  per  cent 
speed  increase  is  therefore,  not  only  reasonable,  but  quite 
good. 

No  major  problems  were  encountered  in  the  module  test, 
although  several  small  problems  were  encountered.  One 
problem  was  a  logic  error  in  the  loop  structure  of  LOADS, 
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wnich  made  it  attempt  to  write  to  main  memory  beyond  the 
bounds  of  the  driver  program.  This  situation  caused  a 
system  memory  protect  error  detected  by  the  special  memory 
protect  hardware  on  the  AFIT  system.  This  optional  hard¬ 
ware  feature  showed  its  usefulness  in  protecting  memory 
from  an  untested  microprogram.  As  one  of  the  HP  manuals 
warns,  "execution  of  an  unproven  microprogram  can  have  un¬ 
predictable  and  undesirable  results,  including  the  de¬ 
struction  of  the  system"  (Ref.  32:5-16).  Once  the  logic 
error  of  LOADS  was  corrected,  the  program  produced  the 
correct  output,  and  the  module  test  was  successful. 

System  Test.  The  system  test  of  LOADS  consisted  of 
the  same  two  steps  as  the  module  test  —  output  verifica- 
tion  and  speed  improvement.  In  this  test,  however,  the 
inputs  to  the  program  were  from  the  operational  wind  tunnel 
control  system  hardware,  and  the  microprogram  was  driven  by 
the  operational  software.  Also,  the  speed  improvement 
measurement  here  was  concerned  with  the  number  of  addi¬ 
tional  rods  of  the  wind  tunnel  that  could  be  driven. 

Two  problems  were  encountered  in  the  system  test.  One 
problem  was  loading  the  microprogram  into  the  WCS  of  the 
wind  tunnel  computer,  and  the  other  was  executing  the  mi¬ 
croprogram  after  it  was  loaded. 

Loading  the  microprogram  was  a  problem  because  of  the 
difference  in  operating  systems  of  the  AFIT  and  wind  tunnel 
.f _  machines.  The  AFIT  system  uses  a  real-time  executive  sys¬ 

tem,  RTE-III,  and  the  wind  tunnel  system  uses  an  older  disk 
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operating  system,  DOS-Ill.  Mic roprograinming  support  soft¬ 
ware  was  available  for  the  DOS-Ill  system,  but  had  not  been 
procurred  for  the  wind  tunnel  machine.  The  problem  was 
solved  by  writing  a  special  WCS  loader  program  in  assembly 
language  using  the  I/O  instruction  sequences  given  in  the 
WCS  manual  (Ref.  33:3-1).  The  listing  for  this  program 
called  WCS LD  is  in  Appendix  H.  Initial  attempts  to  run 
WCSLD  on  the  wind  tunnel  machine  failed.  The  problem  was 
traced  to  a  bad  WCS  board,  and  the  microprogram  was  suc¬ 
cessfully  loaded  to  a  new  board.  WCSLD  will  not  run  on  an 
RTB  system  with  the  memory  protect  option  installed  because 
of  the  direct  I/O  instructions  used. 

The  next  problem  occurred  in  executing  the  LOADS  mi¬ 
croprogram  after  it  had  been  loaded.  The  program  seemed  to 
work,  but  zero  values  were  returned  for  the  stress  calcu¬ 
lations.  This  indicated  that  the  microprogram  was  not  be¬ 
ing  invoked  by  the  assembly  language  instruction  in  SPEED. 
This  problem  was  traced  to  an  improper  combination  of  ad¬ 
dress  jumper  wires  on  the  WCS  board.  Removal  of  two  jumper 
wires  fixed  this  problem,  and  the  program  ran  successfully. 

The  speed  improvement  measurement  was  made  by  inter¬ 
actively  increasing  the  number  of  concurrent  rod  adjust¬ 
ments,  and  monitoring  the  adjustment  process  on  a  special 
light  panel.  The  light  panel  readily  indicated  when  the 
program  bogged  down  because  of  too  many  concurrent  adjust¬ 
ments.  The  test  showed  that  four  rods  could  be  adjusted 
reliably  using  the  microprogrammed  version  of  the  program. 
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'  -  The  old  program  could  only  handle  three  reliably.  It  was 

felt  that  the  microprogrammed  version  was  probably  close  to 
five  rods,  but  this  could  not  be  validated. 

Summary 

This  chapter  covered  the  requirements,  design,  imple¬ 
mentation,  and  testing  of  a  microprogram  called  LOADS,  de¬ 
signed  for  use  in  the  wind  tunnel  control  program.  The 
application  of  the  microprogram  showed  a  31  per  cent  speed 
improvement  in  the  stress  calculation  routine  of  the  con¬ 
trol  program.  This  improvement  resulted  in  a  33  per  cent 
operational  improvement  of  the  rod  adjustment  process  of 
the  tunnel,  by  increasing  the  capability  from  three  to  four 
nr  concurrent  rod  adjustments.  The  loading  and  execution  of 

LOADS  on  a  DOS-Ill  system  showed  that  microprograms  can  be 
run  on  this  system  without  microprogramming  software  sup¬ 


port. 


V. 


Requirements,  Design ,  Implementation  and  Test  of  a 


Microprogram  for  the  Laser  Materials  Model  Program 


Introduction 

As  discussed  in  Chapter  II,  the  laser  materials  mod¬ 
eling  program  is  used  to  calculate  the  real  and  imaginary 
parts  of  refractive  index,  an  important  measure  of  laser 
materials.  Chapter  HI  covered  the  analysis  of  the  routine 
called  CALC  which  performs  this  refractive  index  calcula¬ 
tion.  The  analysis  showed  that  microprogramming  could  be 
applied  to  the  one  loop  in  CALC.  This  chapter  covers  the 
requirements,  design,  implementation  and  test  of  a  micro¬ 
program  called  MCALC  to  replace  this  loop  in  CALC. 


MCALC  Requirements 

The  real  and  imaginary  parts  of  the  refractive  index 
are  given  by  the  following  equations: 
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These  are  the  same  two  equations  which  were  given  in  the 
CALC  requirements  in  Chapter  II.  The  reader  may  wish  to 
refer  back  to  that  chapter  for  parameter  descriptions  and 
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dimensions,  although  they  are  not  needed  for  the  discussion 
here.  CALC  receives  as  inputs  all  the  parameters  required 
to  evaluate  the  two  equations.  Once  evaluated,  the  equa¬ 
tions  can  be  solved  simultaneously  for  n  and  k,  the  real 
and  imaginary  parts  of  refractive  index. 

The  data  flow  for  CALC  is  shown  in  the  level  1,2,  and 
3  DFDs  in  Figure  7.  Data  flow  name  definitions  are  given 
in  Table  X.  The  first  level  gives  the  overall  function  of 
CALC  —  to  compute  n  and  k.  Level  2  divides  this  function 
into  the  two  major  tasks  of  (1)  evaluating  the  two  equa¬ 
tions  and  (2)  solving  these  two  equations  simultaneously. 
Level  3  further  divides  the  first  task  into  four  subtasks. 
The  second  task  of  solving  the  equations  is  of  no  further 
interest  here,  since  this  task  is  not  to  be  microcoded,  and 
a  level  3  diagram  is  not  given. 

The  level  3  diagram  of  the  equation  evaluation  task, 
however,  is  of  particular  interest  since  it  provides  the 
basis  for  the  MCALC  microprogram.  As  noted  before,  this 
diagram  divides  the  equation  evaluation  task  into  four 
subtasks.  The  requirement  for  MCALC  is  to  perform  the 
summation  part  of  the  equation  evaluation  as  shown  in 
"bubble"  1.2.  This  "bubble"  then  becomes  the  first  level 
of  the  MCALC  DFD  shown  in  Figures  8  and  9. 

The  first  level  of  the  MCALC  DFD  shows  three  inputs  — 
F,  B,  and  F2.  These  inputs  consist  of  all  the  equation 
parameters  required  for  the  evaluation  of  the  summation 
parts  of  the  two  equations.  One  additional  input  to  MCALC 
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Figure  7.  Level  1,2,  and  3  DFDs  for  Subroutine  CALC 
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Figure  8.  Level  1  and  2  DFDs  for  MCALC  Microprogram 
(Levels  3  and  4  of  CALC) 


which  cannot  be  shown  on  a  DFD  is  the  upper  limit  of  the 
summation.  This  variable,  called  JJ  in  CALC,  can  have  a 
value  between  one  and  eight  and  serves  as  the  loop  count 
for  MCALC .  The  reason  JJ  cannot  be  shown  on  a  DFD  is  be¬ 
cause  it  is  a  control  variable  rather  than  a  data  variable. 
The  two  outputs  of  MCALC,  C5  and  C6,  are  the  evaluated 
summations  in  the  n  -k  and  2nk  equations  respectively. 

The  second  level  of  the  DFD  divides  the  summation 
computation  into  four  intermediate  computations,  computa¬ 
tions  Cl  through  C4,  and  the  two  final  computations  of  C5 
and  C6.  All  of  the  variable  names  used  so  far  in  the  DFDs 
are  identical  to  those  used  in  the  original  FORTRAN  loop 
shown  in  the  listing  of  CALC  given  in  Appendix  I.  De¬ 
scriptions  of  these  variables  describing  their  relation¬ 
ships  to  the  original  equations  are  given  in  Table  X. 

The  third  level  of  the  DFD  gives  further  details  of 
the  computations  of  Cl  through  C6.  Many  of  the  data  flow 
names  used  here  are  for  the  purpose  of  the  DFD  only  and  are 
not  found  in  CALC  or  MCALC.  These  are  also  described  in 
Table  X. 

Design  of  MCALC 

The  basic  design  of  MCALC  is  shown  in  the  structure 
chart  of  Figure  10.  Like  the  wind  tunnel  microprogram 
LOADS,  MCALC  was  designed  using  transform  analysis  of  the 
DFDs.  The  inputs  B,  F,  and  F2  are  transformed  into  inter¬ 
mediate  calculations  Cl  through  C4.  Cl,  C3,  and  C4  are 


Figure  10.  MCALC  Microprogram  Structure  Chart  (Levels  1  and  2) 


then  transformed  into  C5  and  C6,  the  two  outputs  of  MCALC . 
Figure  11  consists  of  six  structure  charts  which  are  the 
result  of  second-level  factoring  (Ref.  30:180)  or  further 
refinement  of  the  structure  chart  of  Figure  10.  These  six 
structure  charts  show  the  calculations  of  Cl  through  C6. 
As  was  pointed  out  in  the  previous  chapter,  the  variable 
names  used  in  the  structure  chart  do  not  actually  exist  in 
the  microcode,  since  all  data  or  data  addresses  reside  in 
registers  or  main  memory.  Registers  which  are  associated 
with  the  variables  are  shown  in  parenthesis  below  the 
variable  names. 

To  complete  the  design  of  MCALC,  detailed  algorithmic 
(register  transfer  language,  pseudo  English)  steps  are 
written  using  the  structure  of  the  structure  charts.  These 
steps  are  shown  below: 

1.  Initialization 

a.  Read  calling  parameters  from  registers/memory. 

b.  Save  parameters  in  appropriate  registers: 

X  < —  B  address,  Y  < —  JJ  (loop  count) 

S  < —  TMPl  address,  S8  < —  TMP2  address 
S12  <--  TMP 3  address 

2.  Calculate  Cl.  Cl=GAMMA . NU ( I ) *F 

a.  Read  GAMMA. NU  element  from  B  array  into  A/B. 

b.  Get  F  address  from  parameter  list  and  put  into 
S3. 

c.  Call  FMPY  (Floating  Point  Multiply). 

d.  Save  Cl  in  TMPl. 

3.  Calculate  C2.  C2=NU ( I ) *NU ( I ) 

a.  Read  NU  element  from  B  array  into  A/B. 

b.  Put  NU  address  into  S3. 

c.  Call  FMPY. 

d.  Save  C2  in  TMP 2. 

4.  Calculate  C3.  C3=C2-F2 

a.  Get  F2  address  from  parameter  list  and  put  into 
S3. 

b.  Call  FSUB  (Floating  Subtract).  (C2  still  in 
A/B). 

c.  Save  C3  in  TMP 3 . 

5.  Calculate  C4.  C4=(RHO(I)*C2)/(C3*C3+Cl*Cl) 
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MCALC  Microprogram  Structure  Chart  (Level  2  Factor) 
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a.  Read  C2  from  TMP2  into  A./ 3. 

b.  Put  RHO  element  address  into  S3. 

c.  Call  FMPY . 

d.  Save  result  (RB0.C2)  in  TMP2. 

e.  Read  C3  from  TMP3  into  A/'B. 

f.  Put  C3  (TMP3)  address  into  S3. 

g.  Call  FMPY. 

h.  Save  C3*C3  temporarily  into  C1C3  location. 

i.  Read  Cl  from  TMPl  into  A/B. 

j.  Put  Cl  (TMPl)  address  into  S3. 

k.  Call  FMPY. 

l.  Put  C3*C3  (C1.C3)  address  into  S3. 

m.  Call  FADD  (Floating  Add). 

n.  Save  C3*C3+Cl*Cl  in  C1.C3. 

o.  Read  RH0.C2  from  TMP2  into  A/B. 

p.  Call  FDV  (Floating  Divide). 

q.  Save  resulting  C4  in  TMP2. 

6.  Calculate  C5.  C5=C5+C3*C4 

a.  Put  C3  (TMP3)  address  into  S3. 

b.  Read  C4  from  TMP2  into  A/B. 

c.  Call  FMPY. 

d.  Read  C5  address  into  S3. 

e.  Call  FADD. 

f.  Save  new  C5  back  in  C5  location. 

7.  Calculate  C6.  C6=C6+Cl*C4 

a.  Read  Cl  from  TxMPl  into  A/3, 
h.  Put  C4  ( TMP 2 )  address  into  S3, 

c.  Call  FMPY. 

a.  Read  C6  address  into  S3. 

e.  Call  FADD. 

f.  Save  new  C6  back  in  C6  location. 

8.  Check  for  completion. 

a.  Decrement  loop  counter  (Y-reg). 

b.  If  the  counter  does  not  equal  0,  go  to  step  2. 

c.  Return  to  calling  assembly  language  routine. 

Implementation  of  MCALC 

MCALC  was  implemented  in  microcode,  using  a  short  as¬ 
sembly  language  routine  called  ACALC  to  handle  the  inter¬ 
face  between  the  FORTRAN-coded  CALC  and  the  microcoded 
MCALC.  Listings  for  these  three  programs  are  in  Appendix 


J. 

The  implementation  of  MCALC  in  microcode  was  very 
difficult  because  of  the  number  of  floating  point  opera¬ 
tions  required  --  three  adds,  one  subtract,  seven  multi- 
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plies,  and  one  divide.  This  requirement  created  two  major 
problems  —  a  subroutine  return  address  problem  and  a  WCS 
space  problem. 

As  discussed  in  the  previous  chapter,  the  microcoded 
floating  point  routines  in  ROM  cannot  be  used  directly  be¬ 
cause  of  the  problem  of  leveled  subroutine  calls  in  the  M- 
Series  machine.  Only  one  return  address  can  be  stored  in 
the  SAVE  register.  This  problem  was  solved  in  the  LOADS 
microprogram  by  duplicating  the  subroutines  in  WCS  and 
modifying  them  to  return  to  a  fixed  address  in  the  calling 
program.  This  direct  return  technique  was  possible  in 
LOADS  because  only  one  call  was  needed  to  the  floating 
point  multiply  routine  and  one  call  to  the  floating  point 
add  routine. 

This  direct  return  technique  would  also  have  worked  in 
MCALC  for  the  one  divide,  but  not  for  the  adds,  multiplies, 
and  the  one  subtract.  The  subtract  routine  is  actually 
part  of  the  the  add,  so  it  is  also  effectively  called  sev¬ 
eral  times.  If  a  subroutine  is  called  more  than  once  from 
more  than  one  address  in  the  program,  then  the  return  ad¬ 
dress  must  somehow  be  saved,  or  the  subroutine  must  have 
some  way  of  modifying  a  fixed  return  address.  Another 
register  could  be  used  to  store  the  extra  return  address, 
to  augment  the  SAVE  register,  but  there  are  no  microin¬ 
structions  to  make  the  transfer  into  the  DAVE  register. 

* 

The  problem  was  finally  solved  by  storing  all  the  return 
addresses  of  the  microprogram  in  a  table  and  coding  a 
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"jump"  to  the  beginning  of  the  table  modified  by  an  offset 
saved  into  the  instruction  register.  Thus,  in  a  sense  the 
return  address  was  stored  in  the  instruction  register. 

The  WCS  space  problem  was  created  because  of  the  ap¬ 
parent  necessity  to  duplicate  all  of  the  floating  point 
routines  in  WCS.  Duplicating  all  four  routines  required 
146  words  of  the  256-word  WCS.  This  did  not  allow  enough 
room  for  the  rest  of  the  program.  This  problem  was  solved 
by  duplicating  all  but  the  divide  routine,  saving  53  words. 
This  was  enough  to  allow  the  256-word  microprogram  to  fit 
in  WCS.  The  one  divide  was  accomplished  by  microcoding 
instructions  to  load  the  divide  arguments  into  their  proper 
registers  and  main  memory  locations,  and  then  returning  to 
a  macroinstruct ion  to  perform  the  divide.  The  divide  mac¬ 
roinstruction  was  then  followed  by  another  macroinstruct ion 
which  reinvoked  the  microprogram  at  the  continuation  point. 
This  "trick"  allowed  control  of  the  loop  to  remain  at  the 
microcode  level,  even  though  the  divide  was  initiated  by  a 
macro instruction . 

Testing 

Testing  of  MCALC  consisted  of  a  module  test  only. 
Inis  test  was  run  on  the  AFIT  HP  system  using  the  special 
driver  program  CDRVR  to  provide  the  necessary  inputs  to 
MCALC  (via  CALC  and  ACALC).  The  test  had  two  major  pur¬ 
poses  :  (1)  to  verify  correct  output,  and  (2)  to  measure 
speed  increase  as  a  result  of  microprogramming. 
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Output  Verification .  The  test  data  used  for  the 
output  verification  phase  of  the  test  was  obtained  from 
AFWAL/MLPJ  personnel.  Typical  values  for  each  of  the 
equation  parameters  were  chosen.  The  method  of  verifica¬ 
tion  used  was  to  simply  compare  the  outputs  of  the  micro¬ 
programmed  version  of  the  program  to  the  non-micropro- 
grammed  version. 

As  expected,  the  program  did  not  produce  the  correct 
outputs  the  first  time,  and  debugging  was  necessary.  De¬ 
bugging  was  severely  hampered  by  the  size  of  MCALC .  The 
Micro  Debug  Editor  (MDE),  which  was  very  useful  in  the  de¬ 
bugging  of  the  LOADS  microprogram,  was  much  less  useful 
here.  If  breakpoints  are  used  in  the  debugging  process, 
MDE  requires  almost  half  of  the  256-word  WCS  space  to  op¬ 
erate.  This  meant  that  MCALC  had  to  be  segmented  into 
overlays,  and  loaded  into  WCS  in  parts.  This  overlay 
technique  of  debugging  was  found  to  be  very  frustrating  and 
prone  to  human  error.  Breakpoints  in  each  overlay  had  to 
be  carefully  planned,  so  that  the  next  overlay  could  be 
loaded.  Segmenting  the  program  into  overlays  also  required' 
keeping  two  separate  versions.  This  led  to  several  false 
indications  when  the  two  apparently  identical  versions 
(except  for  overlaying)  gave  different  results.  The  de¬ 
bugging  difficulty  was  compounded  even  further  by  the  fact 
that  MCALC  was  a  loop. 

Because  of  the  problems  of  using  the  overlays  with  the 
MDE,  the  overlay  debugging  technique  was  largely  abandoned 
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for  a  higher-level  approach.  Under  this  approach,  the  en¬ 
tire  MCALC  microprogram  was  loaded,  and  the  MDE  was  not 
used  for  setting  breakpoints.  Inputs  were  modified  in  the 
FORTRAN  driver  to  detect  corresponding  changes  in  the  mi¬ 
croprogram  output.  If  it  was  necessary  to  examine  a  mi¬ 
cro-level  register,  the  microcode  was  modified  slightly  to 
pass  the  register  value  back  to  the  FORTRAN  driver  through 
a  main  memory  address.  The  MDE  was  still  useful  for  making 
small  changes  to  microcode.  This  saved  editing  and  reas¬ 
sembly  of  the  microprogram  source,  and  also  kept  the  source 
and  object  fiLes  free  of  debugging  code.  This  higher-level 
debugging  approach  was  successful,  and  the  microprogrammmed 
version  finally  produced  the  expected  outputs,  completing 
the  first  phase  of  the  module  test. 

Speed  Measurement.  The  speed  measurements  of  MCALC 
and  the  FORTRAN  loop  replaced  by  MCALC  were  accomplished 
using  executive  calls  to  read  the  system  clock.  This  was 
the  same  technique  that  was  used  on  the  LOADS  microprogram. 
The  routines  were  called  100  times  from  a  loop  to  get  ac¬ 
curate  timing  measurements.  The  measurements  showed  the 
microprogrammed  routine  to  be  about  ten  per  cent  faster 
than  the  FORTRAN  routine. 

As  in  the  speed  improvement  of  the  LOADS  microprogram, 
this  ten  per  cent  speed  increase  was  significantly  less 
than  the  gains  of  six  to  ten  times  (Ref.  5:98)  or  two  to 
twenty  times  (Ref.  9:49)  reported  in  the  literature.  Close 
analysis  of  the  assembly  language  generated  for  the  FORTRAN 


78 


routine  provided  the  answer  to  this  apparent  disparity. 
Totaling  the  instruction  times  for  floating  point  and  non¬ 
floating  point  instructions  showed  that  66  per  cent  of  the 
routine’s  execution  time  was  spent  in  the  floating  point 
instructions.  Since  the  microcoded  version  of  the  program 
used  these  same  floating  point  routines,  no  speed  improve¬ 
ment  could  be  made  to  66  per  cent  of  the  program.  This 
meant  that  even  if  all  the  non-floating  point  instructions 
could  have  been  eliminated,  the  speed  gain  would  still  have 
been  only  34  per  cent!  Thus,  the  gain  of  ten  per  cent  was 
reasonable  for  this  particular  program. 

Summary 

This  chapter  covered  the  requirements,  design,  imple¬ 
mentation,  and  testing  of  a  microprogram  called  MCALC ,  de¬ 
signed  for  use  in  the  laser  materials  modeling  program. 
The  application  of  the  microprogram  showed  a  ten  per  cent 
speed  improvement  in  the  refractive  index  calculation  rou¬ 
tine  of  the  modeling  program.  This  small  improvement  was 
due  to  the  high  ratio  of  floating  point  to  non-floating 
point  instructions  in  the  program.  This  improvement  was 
not  great  enough  to  show  an  operational  improvement  of  the 
program,  and  thus  was  not  tested  on  the  operational  ma¬ 
chine. 
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VI. 


Automating  the  Tuning  Process 


Introduction 

The  microprogramming  tuning  technique  used  in  the  wind 
tunnel  control  program  and  the  laser  materials  modeling 
program  is  largely  a  manual  technique.  Except  for  the 
generation  of  the  program  activity  profile,  all  of  the 
tuning  processes  must  be  accomplished  manually  by  the  pro¬ 
grammer.  This  technique,  while  effective,  is  slow,  costly, 
and  requires  microprogramming  expertise.  These  disadvan¬ 
tages  motivate  the  study  of  automatic  tuning  techniques. 
The  purpose  of  this  chapter  is  to  review  the  research  that 
has  been  done  in  the  area  of  automatic  tuning,  and  with 
this  background,  discuss  the  feasibility  of  developing  an 
automated  tuning  system  on  the  Ab’lT  HP  21MX  computer. 

Background 

The  literature  search  done  for  this  thesis  investiga¬ 
tion  revealed  that  several  researchers  (Refs.  13-19)  had 
done  work  in  the  area  of  automatic  tuning  of  computer  ar¬ 
chitectures.  Three  different  tuning  approaches  from  the 
literature  are  presented  here. 

Tuning  Approach  #1 .  The  first  approach  presented  is 
one  by  K . A .  El-Ayat  and  J.A.  Howard  (Ref.  17).  In  their 
approach  the  tuning  process  is  divided  into  three  major 
„  steps:  (1)  performance  monitoring  and  measurement,  (2) 
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analysis  of  the  data  of  the  first  step,  and  (3)  synthesis 
of  microprograms  to  improve  deficiencies  found  in  the  sec¬ 
ond  step.  The  goals  of  this  approach  are  "significant  im¬ 
provement  in  performance,  low  implementation  cost  (over¬ 
head)  and  minimal  human  intervention"  (Ref.  17:86). 

The  performance  monitoring  and  measurement  step  is 
essentially  an  enhanced  version  of  the  activity  profile 
generation  used  in  this  thesis  study.  As  discussed  in 
Chapter  III,  the  monitoring  can  be  done  with  hardware, 
software,  or  microcode.  Here,  the  step  is  done  with  a  very 
short  (eight  lines)  microprogram,  presumably  added  to  the 
instruction  fetch  routine.  The  result  of  the  performance 
monitoring  and  measurement  is  an  instruction  trace  and  a 
trace  of  data  referencing  patterns.  The  instruction  trace 
indicates  where  a  program  should  be  tuned,  and  the  data 
trace  indicates  which  data  items  should  be  stored  in 
micro-level  registers  to  eliminate  main  memory  fetches. 

The  purpose  of  the  analysis  step  is  to  analyze  the 
data  from  the  first  step  to  determine  where  the  program 
should  be  tuned.  Two  types  of  program  segments  are  se¬ 
lected  for  tuning,  loops  and  non-loops.  The  loop  segment 
is  a  set  of  instructions  which  is  terminated  by  a  branch 
back  to  the  first  instruction.  The  non-loop  segment  can  be 
terminated  by  either  a  branching  or  non-branching  instruc¬ 
tion.  In  the  non-branching  case,  termination  is  indicated 
when  the  profile  activity  of  that  instruction  is  less  than 
that  of  the  preceeding  instruction.  If  the  segment  is 


terminated  by  a  branching  instruction,  the  branch  cannot  be 
back  to  the  first  instruction. 

In  the  analysis  algorithm,  the  execution  profile  is 
searched,  and  each  instruction  execution  count  is  compared 
to  a  minimum  preset  threshold  count.  If  the  threshold  is 
met,  the  instruction  correspond ing  to  the  execution  count 
is  selected  as  the  first  instruction  of  the  segment.  Sub¬ 
sequent  instructions  are  examined  to  determine  the  end  of 
the  segment  and  the  segment  type.  A  segment  must  also  meet 
a  preset  minimum  size  threshold  (minimum  number  of  machine 
instructions).  The  resulting  output  of  the  analysis  step 
is  a  set  of  program  segments  ordered  by  segment  type,  size, 
and  execution  frequency.  This  ordering  assures  that  seg¬ 
ments  having  the  greatest  potential  for  performance  im¬ 
provement  are  tuned  first,  since  the  WCS  space  may  prevent 
the  tuning  of  all  the  selected  segments. 

The  final  step  of  this  tuning  approach  is  the  synthe¬ 
sis  of  the  microprograms  and  the  machine  language  instruc¬ 
tions  which  invoke  the  microprograms.  Program  segments  are 
taken  in  order  from  the  analysis  step,  and  checked  to  see 
if  the  corresponding  microcode  will  fit  in  the  WCS.  If  so, 
the  first  machine  instruction  of  the  segment  is  replaced 
with  an  instruction  which  invokes  the  microprogram.  This 
instruction  is  followed  by  the  segment  operand  addresses. 
Each  instruction  is  translated  into  microcode  using  the 
instruction  opcode  as  the  translation  key.  The  microcode 
is  then  loaded  into  WCS,  ready  for  execution.  Loop  seg- 
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merits  require  extra  microcode  to  initialize  working  micro¬ 
level  registers,  which  store  loop  variables,  frequently 
used  operands,  and  intermediate  results. 

The  synthesis  step  also  includes  microcode  optimiza¬ 
tion.  The  optimization  applied  here  eliminates  unnecessary 
instruction  and  data  fetches,  makes  use  of  local  store  and 
emit  fields  within  microinstructions,  eliminates  redundant 
and  negated  microoperations,  and  uses  parallel  microopera¬ 
tions  when  possible. 

Results  of  tuning  experiments  using  this  approach  show 
that  the  speed  of  loop  segments  increase  4  to  8  times,  and 
non-loop  segments  by  1.7  to  4  times.  The  speeds  of  the 
overall  programs  show  a  30  to  45  per  cent  improvement. 

Tuning  Approach  £2.  The  second  tuning  approach 
presented  here  is  one  by  Philip  S.  Liu  and  Frederic  J. 
Mowle  (Ref.  18).  It  is  actually  four  separate  methods  of 
tuning  that  they  have  studied:  (1)  "Static  Loading  of  Inner 
Loops,"  (2)  "Selective  (and  Static)  Loading  of  Inner 
Loops,"  (3)  "Dynamic  Overlaying  of  Inner  Loops,"  (4)  and 
"User  Aided."  The  first  three  methods  consider  only  inner 
loops  of  programs  as  candidate  segments  for  microprogram¬ 
ming.  The  candidate  segments  of  the  fourth  method  can  be 
either  loop  or  non-loop  segments. 

The  first  method  requires  the  compiler  to  identify  all 
the  inner  loops  of  the  program.  The  loops  are  then  con¬ 
verted  to  microcode  in  the  order  that  they  appear  in  the 
program.  Data  items  within  the  loops,  both  variables  and 
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constants,  are  mapped  into  available  micro-level  registers. 
If  not  enough  registers  are  available,  the  most  frequently 
used  data  items  are  mapped  first,  and  the  remaining  items 
are  accessed  from  main  memory.  The  conversion  process, 
which  can  be  done  at  the  source  or  object  code  levels, 
continues  until  the  WCS  is  filled.  The  major  drawback  of 
this  first  method  is  that  the  WCS  can  be  filled  before  all 
the  loops  have  been  converted.  Since  the  loops  have  been 
taken  in  the  order  that  they  appear  in  the  program,  some 
time-consuming  loops  may  be  omitted. 

The  second  method  remedies  the  drawback  of  the  first 
method  by  requiring  the  compiler  to  assign  priorities  to 
the  inner  loops.  Loops  are  then  converted  and  loaded  into 
the  WCS  on  a  priority  basis.  The  priority  of  an  inner  loop 
is  equal  to  its  number  of  outer  loops.  The  assumption  here 
is  that  the  inner  loop  with  the  greatest  number  of  outer 
loops  will  be  executed  the  most  times,  and  should  be  given 
the  highest  priority.  Inner  loops  with  equal  priority  are 
converted  to  microcode  in  order  of  size,  the  one  with  the 
most  object  instructions  taken  first.  With  this  second 
method  all  tne  inner  loops  may  still  not  fit  in  the  WCS, 
but  at  least  the  most  important  ones  are  loaded  first. 

The  third  method  insures  that  all  inner  loops  of  the 
program  can  be  loaded  into  the  WCS,  but  not  all  at  the  same 
time.  This  method  works  like  a  cache  memory  system  where 
the  main  memory  is  divided  into  blocks,  and  a  block  is 
loaded  into  the  faster  cache  memory  when  it  is  needed.  In 


this  third  method,  all  the  inner  loops  are  converted  to 
microcode,  given  an  identification  number,  and  stored  in 
main  memory.  When  a  loop  is  needed,  the  identification 
number  of  the  one  currently  in  the  WCS  is  checked.  If  the 
needed  loop  is  not  in  the  WCS,  then  it  is  loaded  over  the 
one  currently  in  the  WCS.  The  major  problem  with  this 
method  is  the  overhead  of  swapping  microcode  in  and  out  of 
the  WCS.  This  overhead  can  be  quite  high  because  the  WCS 
word  length  is  usually  greater  than  that  of  the  main  memory 
word,  requiring  two,  three,  or  even  more  main  memory  word 
tranfers  for  one  WCS  word.  The  speed  gain  of  the  micro- 
coded  loops  has  to  be  great  enough  to  offset  this  overhead. 

The  first  three  methods  assume  no  a  priori  knowledge 
about  the  execution  of  the  program.  The  fourth  method  as¬ 
sumes  that  the  user  has  such  knowledge  about  the  program. 
This  method  allows  the  user  to  specify  the  program  segments 
to  be  microcoded  and  the  order  in  which  they  are  micro- 
coded.  All  the  microcoded  segments  can  be  initially  loaded 
into  the  WCS  as  in  the  first  and  second  methods,  or  they 
can  be  dynamically  overlayed  as  in  the  third  method. 

All  four  of  the  above  methods  were  tested  with  six 
arbitrarily-chosen  FORTRAN  programs.  The  resulting  speed 
gains  are  shown  in  Figure  12  (Ref.  18:Fig.  6)  as  functions 
of  the  WCS  size.  As  shown,  the  fourth  method  produced  the 
best  program  improvement,  because  of  the  human  interven¬ 
tion.  The  second  method,  however,  did  almost  as  well  with 
no  human  intervention.  With  a  small  WCS  size,  the  third  or 
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overlay  method  did  the  best.  Liu  and  Mowle  recommend  a 
combination  of  the  third  and  fourth  methods,  the  user  aided 
and  dynamic  ever lay  methods. 

Tuning  Approach  #3 .  The  third  tuning  approach  is 
one  used  in  a  system  designed  and  implemented  by  K. 
Sakamura,  T.  Morokuma,  H.  Aiso,  and  H.  lizuka  (Ref.  19).  A 
model  of  the  system  is  shown  in  Figure  13. 

The  model  shows  that  the  system  consists  of  a  comput¬ 
er,  a  monitor,  an  analyzer,  a  data  base,  and  a  feedback 
mechanism.  The  computer  executes  the  program  (or  problem). 
The  monitor  collects  information  on  the  relative  frequen¬ 
cies  of  machine  instructions,  sequences  of  instructions 
(serial  dependencies),  and  address  and  data  values.  The 
analyzer  uses  this  information  obtained  by  the  monitor  to 
determine  which  segments  of  the  program  should  be  micro- 
coded  in  order  to  speed  up  the  program  execution.  The  an¬ 
alyzer  then  synthesizes  these  new  microcoded  instructions. 
The  feedback  path  is  used  to  write  the  newly  synthesized 
microprograms  into  the  WCS,  thus  tuning  the  architecture  of 
the  computer.  The  data  base  for  learning  stores  informa¬ 
tion  about  previous  iterations  of  the  tuning  process.  The 
analyzer  can  refer  to  this  information  in  order  to  minimize 
the  number  of  iterations. 

An  experimental  system  has  been  implemented  using  an 
HP  2100  computer  with  a  lK  X  24-bit  control  store  (0.25K 
ROM  and  0.75K  WCS).  The  monitor  is  a  DYNAPROBE  7900+3000 
hardware  monitor,  and  a  PDP-11V03  is  used  as  the  analyzer 
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and  synthesizer.  A  block  diagram  is  shown  in  Figure  14. 

As  the  program  under  test  executes,  the  DYNAPROBE 
monitors  the  execution  and  feeds  the  information  directly 
to  the  PDP-11V03.  The  PDP-11V03  analyzes  the  execution 
information,  synthesizes  the  new  instructions,  and  passes 
these  new  instructions  to  the  HP  2100  through  the  I/O  in¬ 
terface. 

Since  separate  hardware  is  used  for  both  the  monitor 
and  analysis  functions,  there  is  very  little,  if  any, 
overhead  in  the  tuning  process.  The  result  of  the  experi¬ 
mental  system  is  a  30  to  60  per  cent  improvement  in  execu¬ 
tion  time  of  the  tuned  programs  over  the  originals. 

Review  of  the  Three  Approaches.  The  three  ap¬ 
proaches  discussed  are  quite  different  from  each  other,  but 
they  share  two  common  steps:  (1)  automatic  determination  of 
the  program  segments  to  microprogram,  and  (2)  automatic 
synthesis  of  the  microprograms. 

The  first  approach  uses  a  microprogram  to  precisely 
monitor  program  execution.  The  program  is  divided  into 
loop  and  non-loop  segments,  and  the  execution  data  is  used 
to  determine  which  segments  to  microprogram.  In  the  second 
approach  the  process  of  determining  which  segments  to  mi¬ 
croprogram  is  simplfied  by  choosing  only  inner  loop  seg¬ 
ments  or  other  segments  specified  by  the  user.  The  third 
approach  uses  a  hardware  monitor  to  obtain  program  execu¬ 
tion  data,  and  uses  a  separate  computer  to  perform  the 
analysis  and  determine  which  segments  to  microprogram. 
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Functional  Configuration  of  the  Experimental 
System 


This  approach  is  the  most  sophisticated  of  the  three,  but 
also  requires  the  most  hardware.  The  first  approach  is 
next  in  sophistication,  but  requires  modification  of  the 
computer's  microcoded  fetch  routine.  The  second  approach 
is  by  far  the  simplest  and  requires  no  extra  hardware  or 
special  modifications  to  the  computer  firmware. 

While  all  three  approaches  use  automatic  microprogram 
synthesis,  only  the  paper  on  the  first  approach  covers  the 
actual  algorithm  used  to  perform  this  step.  The  software 
which  implements  the  algorithm  resides  in  the  machine  run¬ 
ning  the  program,  and  the  synthesis  step  is  performed  in  an 
"off-line"  mode.  The  third  approach  uses  a  separate  com¬ 
puter  to  perform  this  step,  the  same  machine  that  performed 
the  analysis.  In  this  approach  the  synthesis  (and  analy¬ 
sis)  is  performed  while  the  application  program  is  running, 
and  the  new  microprograms  are  transferred  back  to  the  ap¬ 
plication  machine  through  a  feedback  loop.  Thus,  the  syn¬ 
thesis  is  an  iterative  process  performed  in  an  "on-line" 
mode.  Details  of  the  process  in  the  second  approach  are 
not  given.  Again,  the  third  approach  seems  to  be  the  most 
sophisticated  at  the  expense  of  more  hardware. 

The  performance  results  of  the  three  approaches  are 
similar.  The  first  approach  showed  performance  improve¬ 
ments  of  30  to  45  per  cent,  and  the  third  approach  showed 
improvements  of  30  to  60  per  cent.  The  second  approach  had 
similar  gains,  although  they  are  given  as  ratios  of  non- 
tuned  to  tuned  execution  times,  rather  than  percentages. 


Automating  the  AFIT  HP  21MX  System 

The  background  information  on  the  three  automatic 
tuning  systems  provides  a  good  perspective  for  examining 
the  feasibility  of  an  automated  tuning  system  on  the  AFIT 
HP  21MX  system.  The  following  discussion  is  intended  to 
present  the  general  requirements  and  some  possible  ap¬ 
proaches  to  developing  such  a  system. 

General  Requirements.  The  general  requirements  can 
be  given  in  terms  of  user-system  interface,  system  input 
and  output,  and  performance  objectives. 

The  users  of  this  system  are  expected  to  be  competant 
programmers  in  higher  level  languages,  mainly  FORTRAN, 
since  this  is  the  major  higher  level  language  used  on  the 
HP  21MX  at  Wright-Pat terson .  They  may  or  may  not  have  ex¬ 
perience  with  HP  21MX  assembly  language,  and  probably  do 
not  have  microprogramming  experience.  The  tuning  system 
should  be  designed  with  these  experience  levels  in  mind. 
The  system  does  not  have  to  be  totally  automatic  with  no 
user  interaction,  such  as  the  one  in  the  third  approach 
discussed.  In  fact,  an  interactive  system  may  be  prefera¬ 
ble,  as  suggested  by  "user  aided"  method  in  the  second  ap¬ 
proach.  The  system  should,  however,  be  user-friendly  and 
should  make  the  details  of  the  microprogramming  and  the 
micro-level  architecture  as  transparent  as  possible  to  the 
user . 


The  inputs  to  the  system  consist  ot  the  all  of  the 


available  files  associated  with  the  program  (FORTRAN  or 
assembly  language)  being  tested,  plus  interactive  inputs 
from  the  user.  The  program  files  include  the  following: 
source,  relocatable  object,  memory  image,  listing,  and  load 
map  files.  All  of  these  files  may  not  be  needed  to  accom¬ 
plish  the  tuning  process,  but  are  listed  anyway  as  possible 
inputs.  The  one  output  of  the  system  is  an  executable  mem¬ 
ory  image  file  of  the  tuned  program. 

A  performance  objective  is  an  important  requirement 
for  the  system,  but  it  is  difficult  to  specify  a  program 
speed  improvement  figure  that  the  system  should  be  able  to 
meet.  The  amount  of  improvement  of  a  given  program  is 
largely  dependent  on  the  characteristics  of  that  program. 
This  is  true  for  manual  tuning  as  well  as  automatic  tuning. 
A  25  to  30  per  cent  improvement  is  probably  a  reasonable 
performance  objective  for  an  automatic  system  as  indicated 
by  the  results  of  the  three  approaches  discussed.  Anything 
below  this  is  probably  operationally  insignificant  for  most 
programs . 

Possible  Approaches.  As  discussed,  the  three  ap¬ 
proaches  share  two  common  steps  in  the  tuning  process:  (1) 
automatically  (with  possible  user-interaction)  determining 
which  segments  of  the  program  to  microprogram,  and  (2)  au¬ 
tomatically  synthesizing  the  microprograms.  Possible  ap¬ 
proaches  to  automating  the  AFIT  system  are  discussed  in 
terms  of  accomplishing  these  two  basic  tuning  steps. 

The  activity  profile  generator  program  (ACTV)  par- 
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tially  satisfies  the  requirement  to  determine  which  program 
segments  to  microprogram.  ACTV  in  its  present  form  divides 
a  program  under  test  into  50  equal  intervals  and  determines 
a  profile  count  for  each  interval.  These  equally  divided 
intervals  do  not,  however,  correspond  to  the  logical  seg¬ 
ments  of  the  program,  the  loops  and  non-loops,  for  example. 
The  profile  counts  for  these  equal  intervals  must  be  con¬ 
verted  into  counts  for  the  logical  segments.  This  requires 
first  the  determination  of  the  address  boundaries  of  the 
logical  segments,  and  then  the  correlation  of  these  bound¬ 
aries  to  the  profile  interval  boundaries. 

An  example  may  help  explain  the  process.  Figure  15 
contains  a  block  diagram  and  partial  execution  profile  of  a 
program  with  three  segments  —  a  non-loop  followed  by  a 
loop,  which  is  followed  by  another  non-loop.  The  address 
boundaries  in  octal  for  the  three  segments  are  40000  to 
40250,  40250  to  40330,  and  40330  to  40620  respectively. 
Finding  the  segment  boundaries  requires  an  algorithm  which 
analyzes  the  branching  instructions  of  the  program.  The 
example  shown  contains  at  least  one  branching  instruction, 
a  "jump”  from  the  end  of  the  loop  segment  back  to  the  be¬ 
ginning.  No  software  currently  exists  at  AFIT  to  perform 
the  segment-bounding  task  on  the  HP  system,  but  the  soft¬ 
ware  should  not  be  difficult  using  an  existing  algorithm. 
El-Ayat  and  Howard,  authors  of  the  first  approach  dis¬ 
cussed,  describe  one  algorithm  for  finding  segment  bound¬ 
aries  (Ref.  17:86).  Their  algorithm  requires  a  very  pre- 
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cise  program  activity  profile,  however,  and  would  have  to 
be  modified  to  work  with  the  statistical  profile  provided 
by  ACTV.  Liu  and  Mowle,  the  authors  of  the  second  ap¬ 
proach,  mention  another  technique  called  "control  flow 
analysis"  for  detecting  loops  in  a  program,  but  do  not  give 
details  of  the  technique  (Ref.  18:817).  User  interaction 
may  also  be  beneficial  in  finding  segment  boundaries  of  a 
program. 

When  the  segment  boundaries  are  found,  the  conversion 
of  interval  counts  to  segment  counts  can  be  done.  The  ac¬ 
tivity  profile  in  the  figure  shows  counts  or  "hits"  for 
several  of  the  52  intervals  (50  intervals  in  the  program 
range  and  two  outside).  The  bracketed  "hits"  show  the 
mapping  of  the  profile  intervals  to  the  logigal  segments  of 
the  program.  From  a  mapping  such  as  this,  the  profile 
counts  for  each  logical  segment  can  be  determined,  and  a 
decision  on  which  segments  to  microprogram  can  be  made. 
This  process  can  easily  be  done  by  software  given  all  the 
input  information  shown  in  the  figure. 

The  reader  should  note  that  this  is  a  contrived  exam¬ 
ple  with  the  interval  and  segment  boundaries  chosen  to  al¬ 
low  a  perfect  mapping.  In  practice,  this  does  not  usually 
happen.  A  profile  interval  can  overlap  a  segment  boundary, 
making  it  difficult  to  determine  which  segment  receives  the 
profile  count.  This  is  probably  not  major  problem,  because 
other  segment  counts  can  be  used  to  determine  the  probable 
correct  segment.  If  the  program  is  very  large,  the  inter- 


96 


ri¬ 


vals  of  the  profile  can  actually  be  larger  than  the  logical 
segments,  since  the  profile  generator  program  presently 
allows  only  50  intervals.  Several  logical  segments  could 
then  occur  in  one  interval,  again  making  a  correct  mapping 
of  the  count  difficult.  Two  possible  solutions  to  this 
problem  exist.  The  number  of  profile  intervals  can  be  dy¬ 
namically  adjusted  (within  main  memory  limits)  to  match  the 
program  size,  or  several  profiles  can  be  run  with  each 
"looking"  at  a  different  portion  of  the  program.  This 
latter  approach  was  used  in  the  manual  tuning  of  the  wind 
tunnel  and  laser  materials  programs. 

Another  approach  to  determining  which  program  segments 
to  microprogram  is  to  choose  only  loop  or  inner-loop  seg¬ 
ments  as  in  the  second  approach  of  the  background  informa¬ 
tion.  This  approach  eliminates  the  need  for  any  type  of 
activity  profile  generation  and  the  problem  of  mapping 
profile  intervals  to  program  segments.  The  segments  must 
still  be  identified,  but  this  has  to  done  anyway.  This 
approach  has  the  advantage  of  simplicity,  and  the  results 
from  the  three  approaches  discussed  shows  that  it  compares 
favorably  with  the  others.  Also,  the  analysis  of  the  two 
programs  in  this  thesis  study  supports  the  theory  that 
loops  account  for  much  of  the  program  activity,  and  are  the 
best  candidates  for  microprogramming. 

From  a  user  point  of  view,  determining  which  segments 
of  a  program  to  microprogram  is  the  easier  of  the  two  basic 
steps  in  the  tuning  process.  The  concept  of  program  seg- 


97 


merits,  such  as  loops  and  non-loops,  and  their  execution 
times  is  nothing  really  new  to  the  higher-level  language 
programmer.  The  second  step  of  synthesizing  the  micropro¬ 
grams  is  a  much  more  unfamiliar  task,  and  requires  famil¬ 
iarity  with  the  architecture  of  the  machine  and  assembly 
language  and  microprogramming  expertise.  Thus,  the  auto¬ 
mation  of  this  step  is  even  more  important.  An  example  is 
used  here  to  explain  the  synthesizing  process. 

Figure  16  shows  the  synthesis  process  for  two  macro¬ 
instructions  from  the  manually-tuned  wind  tunnel  subroutine 
SPEED.  The  two  macroinstructions,  "LDA  .YZT"  and  "STA 
. . Y Z T " ,  are  shown  along  with  the  two  "BSS"  pseudo-instruc¬ 
tions,  which  define  memory  locations  for  . YZT  AND  . . YZT . 
The  function  of  these  instructions  is  to  simply  load  the 
"A"  register  with  the  contents  of  .YZT  and  store  that  con¬ 
tents  into  ..YZT  (i.e.,  ..YZT  =  .YZT). 

The  figure  shows  the  breakdown  of  the  instructions' 
machine  code  into  four  fields  —  D/I,  opcode,  Z/C,  and  ar¬ 
gument  relative  address  (all  nuraer '  :  values  in  octal).  The 
opcode  and  argument  relative  address  are  self-explanatory. 
The  D/I  is  a  bit  indicating  whether  the  argument  relative 
address  is  used  directly  or  indirectly.  The  Z/C  bit  indi¬ 
cates  whether  the  argument  address  is  relative  to  page  zero 
of  memory  or  the  current  page.  For  these  two  instructions, 
both  addresses  are  direct  and  relative  to  the  current  page 
—  page  42000  as  indicated  by  the  instruction  addresses. 

The  breakdown  of  the  machine  code  is  the  key  to  the 


Two  Macroinstructions  to  be  Microprogrammed 


Instruction  Address 


042647 
42650 
042654 
042670 


ARGUMENT 

RELATIVE 

D/I  OPCODE  Z/C  ADDRESS 
0  14/  0654 


Machine  Code  Label 


Instruction 


- <062654 

LOOP  2 

- <072670 

000000 

•  YZT 

000000 

.  .YZT 

DA 

.YZT 

TA 

.  .YZT 

SS 

1 

1 

ARGUMENT 

RELATIVE 

D/I  OPCODE  Z/C  ADDRESS 
0  16/  \1  0670 


MICROCODE  LOOK-UP  TABLE 


D/I  OPCODE  MICROCODE 


FORM 

ARGUMENT 

ABSOLUTE 

ADDRESS 


D  4  2  6  5  4 
042670 


44074707 

44000447 

03700547 

44074707 

03700461 

37726007 


Synthesized  Macroinstructions 

LOOP2  OCT  105620  Invoke  microprogram 

- >0CT  042654  . YZT  address 

- >0CT  042670  ..YZT  address 


Synthesized  Microinstructions 


44074707 
44000447 
„0  3700547 
'44074707 
03700461 


READ  INC  PNM  P 

READ  INC  M  TAB 

PASS  A  TAB 
READ  INC  PNM  P 

MPCK  PASS  M  TAB 


V 


37726007  WRTE  RTN  PASS  TAB  A 


P  Read  argument  address 
TAB  Read  argument 
TAB  Load  into  register  A 
P  Read  argument  address 
TAB  Address  to  M  register 
for  WRTE;  Do  memory 
protect  check 
A  Store  contents  of  A 

Return  to  FETCH  routine 


Figure  16.  Microprogram  Synthesis  Example 


synthesis  process.  The  D/I  and  opcode  fields  can  be  used 
as  an  index  into  a  microcode  look-up  table  to  obtain  a  se¬ 
quence  of  microinstructions  which  will  replace  the  original 
macroinstruction.  This  partial  look-up  table  is  derived 
indirectly  from  the  microcode  for  the  basic  instruction  set 
stored  in  control  store  ROM.  A  listing  for  the  entire  ba¬ 
sic  instruction  set  of  the  HP  21MX  is  available  in  Appendix 
E  of  Reference  32.  Table  XI  shows  the  microinstructions 
for  the  memory  reference  group  instructions,  such  as  the 
LDA  and  STA  instructions  in  the  example.  The  microcode  in 
Table  XI  cannot  be  used  directly  to  translate  a  macroin¬ 
struction  to  a  microroutine.  The  reason  for  this  is  that 
the  macroinstruction  is  performed  partially  by  the  micro- 
coded  fetch  routine  (shown  at  the  bottom  of  Table  XI)  and 
partially  by  the  macroinstruction's  microroutine.  The  LDA 
microroutine,  for  example,  consists  of  only  one  microin¬ 
struction  as  shown  in  Table  XI  (LDA  and  LDB  are  shown  as 
LD*).  The  fetch  routine  in  this  case  performs  the  major 
part  of  the  instruction.  Another  reason  the  microcode  from 
the  table  cannot  be  used  directly  is  that  the  microopera¬ 
tions  within  the  microinstructions  often  perform  operations 
which  are  conditional  on  information  in  the  instruction 
register.  When  the  macroinstructions  are  replaced  by  mi¬ 
crocode,  they  are  no  longer  fetched,  and  are  not  stored 
into  the  instruction  register.  The  microroutines  stored  in 
the  look-up  table  must  compensate  for  the  lack  of  the  in¬ 
struction  fetch  and  the  information  in  the  instruction 
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register. 

The  microroutines  for  the  LDA  and  STA  instructions  in 
the  example  are  each  three  microinstructions  long.  The 
mnemonics  for  the  microcode  from  the  look-up  table  are- 
shown  at  the  bottom  of  Figure  16.  The  "RTN"  microoperation 
of  the  last  microinstruction  is  not  actually  encoded  in  the 
table.  It  is  added  to  the  last  microinstruction  to  trans¬ 
fer  control  back  to  the  macroinstruction  level. 

The  Z/C  and  argument  relative  address  fields  are  used 
to  form  an  argument  absolute  address.  If  the  address  is 
relative  to  page  zero,  the  relative  and  absolute  addresses 
are  equivalent.  If  the  address  is  relative  to  the  current 
page,  the  current  page  address  is  added  to  the  relative 
address.  The  example  shows  the  formation  of  the  . Y Z T  and 
. . Y Z T  absolute  addresses.  These  addresses  are  sequentially 
annexed  to  an  argument  list  following  a  synthesized  macro¬ 
instruction.  This  macroinstruction  invokes  the  synthesized 
microroutine,  and  along  with  the  argument  list,  replaces 
the  original  macroinstructions. 

Synthesizing  the  microroutines  in  the  manner  described 
usually  does  not  produce  optimal  microcode.  Optimization 
should  be  performed  as  another  step  of  the  synthesis  pro¬ 
cess,  as  in  the  first  tuning  approach  discussed.  Fre¬ 
quently  used  argument  addresses  should  be  removed  from  the 
argument  list  and  stored  into  available  micro-level  regis¬ 
ters.  The  address  can  then  be  accessed  by  a  register 
transfer  instead  of  a  main  memory  read,  saving  at  least  one 
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microinstruction.  Microinstructions  which  access  main 
memory  should  be  followed  by  microinstructions  which  do  not 
access  the  memory  data  register,  the  "T"  register.  Memory 
reads  and  writes  require  two  microcycles,  and  such  in¬ 
structions  cause  a  "processor  freeze"  (Ref.  32:3-14)  until 
the  read  or  write  is  complete.  At  least  three  "freezes" 
occur  in  the  synthesized  microroutine  of  the  example,  but 
nothing  can  be  done  in  this  case.  Parallel  microoperations 
should  be  used  as  much  as  possible.  An  example  of  this  is 
the  addition  of  the  "RTN"  to  the  last  microinstruction  of 
the  example,  rather  than  making  it  a  separate  microin¬ 
struction.  Redundant  or  negated  microinstructions  should 
also  be  eliminated  (Ref.  17:87).  These  optimization  checks 
can  probably  be  done  during  the  synthesis,  but  may  be  more 
easily  done  during  a  "second  pass". 

The  final  products  of  the  synthesis  step  are  the  syn¬ 
thesized  macroinstructions  and  microinstructions  as  shown 
at  the  bottom  of  Figure  16.  The  macroinstructions  may  di¬ 
rectly  overwrite  the  ones  they  are  replacing  in  main  memory 
or  in  the  memory  image  file,  assuming  no  relocation  or  op¬ 
erating  system  problems  exist.  The  microinstructions  may 
be  written  to  the  WCS  or  to  a  file  ising  existing  micro¬ 
program  utilities. 

The  synthesis  step,  while  more  difficult  than  the 
analysis  step,  is  staightforward  and  is  quite  adaptable  to 
an  automated  tuning  system.  Probably  the  most  difficult 
problem  is  the  building  of  the  microcode  look-up  table, 
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since  this  is  largely  a  manual  process.  The  documented 
microroutines  for  the  HP  21MX  base  instruction  set  can  be 


used  as  a  basis  for  this  table,  but  require  substantial 
modification  because  of  the  elimination  of  the  fetch  in¬ 
struction  as  discussed  earlier.  Although  building  the  ta¬ 
ble  presents  a  substantial  manual  microprogramming  effort, 
it  can  be  done  and  is  not  considered  a  major  obstacle  to 
automating  the  synthesis  step. 

Summary 

This  chapter  has  dealt  with  the  feasibility  of  imple¬ 
menting  an  automated  tuning  system  on  the  AFIT  HP  21MX 
computer.  Three  automatic  systems  from  the  literature  were 
presented  to  provide  background  on  the  tuning  process  and 
different  approaches  to  the  implementation  of  such  a  sys¬ 
tem.  The  general  requirements  for  an  AFIT  system  were 
discussed  in  terms  of  user-system  interface,  system  input 
and  output,  and  performance  objectives.  Finally,  possible 
approaches  to  automating  the  system  were  discussed  in  terms 
of  accomplishing  the  two  basic  steps  of  the  tuning  process 
—  determining  which  segments  of  the  program  to  micropro¬ 
gram,  and  automatically  synthesizing  the  microprograms. 
Although  many  of  the  implementation  details  have  not  been 
discussed  here,  an  automated  tuning  system  on  the  AFIT  HP 
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Results ,  Conclusions ,  and  Recommendations 


Introduction 

This  thesis  study  focused  on  the  performance  improve¬ 
ment  of  HP  21MX  application  programs  using  microprogramming 
tuning  techniques.  Routines  from  two  application  programs 
were  chosen  as  candidates  for  microprogramming  as  a  result 
of  a  survey  of  HP  users  at  Wright-Patterson  Air  Force  Base. 
The  two  routines  chosen  were  a  stress  calculation  routine 
for  a  wind  tunnel  control  program  and  a  refractive  index 
calculation  routine  for  a  laser  materials  modeling  program. 
Microprograms  were  written  and  applied  at  points  in  the 
routines  indicated  by  activity  profile  analysis.  The  speed 
improvement  of  the  resulting  programs  was  then  measured. 
The  experience  gained  from  tuning  the  two  application  pro¬ 
grams  and  studies  in  the  literature  provided  the  background 
for  investigating  the  feasibility  of  developing  an  auto¬ 
mated  tuning  system  on  the  AFIT  HP  21MX.  This  chapter 
lists  the  results,  conclusions,  and  recommendations  of  the 
thesis  study. 

Results 

The  following  are  considered  the  major  results  of  the 
study: 

1.  The  performance  improvement  of  the  wind  tunnel 
stress  calculation  routine  was  about  31  per  cent.  This 


resulted  in  an  operational  improvement  of  33  per  cent  in 


the  rod  adjustment  process  of  the  wind  tunnel  control  pro¬ 
gram  by  allowing  the  simultaneous  adjustment  of  four  rather 
than  three  rods.  The  routine  was  subsequently  integrated 
into  the  operational  version  of  the  program.  This  may  have 
been  the  first  user-microprogram  to  be  applied  to  an  oper¬ 
ational  HP  21MX  program  at  Wright-Patterson . 

2.  The  performance  improvement  of  the  laser  materials 
routine  was  about  10  percent,  which  was  not  enough  to  no¬ 
ticeably  improve  the  waveform  display  for  the  user.  This 
program  showed  the  limitations  of  microprogramming  in  im¬ 
proving  a  routine  with  a  large  number  of  floating  point 
operations.  Writing  and  debugging  the  microprogram  for 
this  routine  also  provided  experience  working  with  large 
microprograms  and  leveled  subroutines. 

3.  Both  the  wind  tunnel  and  laser  materials  micro¬ 
programs  were  developed  using  software  engineering  tech¬ 
niques.  This  resulted  in  structured  microprograms  that 
were  documented  much  better  than  the  original  FORTRAN  and 
assembly  language  candidate  programs. 

4.  Work  on  the  two  application  programs  exposed  at 
least  two  HP  users  at  Wright-Patterson  to  user-micropro¬ 
gramming.  These  users  will  hopefully  consider  the  possi¬ 
bility  of  applying  microprogramming  to  future  applications. 

5.  The  investigation  into  the  feasibility  of  devel¬ 
oping  an  automated  tuning  system  on  the  AFIT  HP  computer 
showed  that  such  a  system  was  feasible,  and  possible  ap¬ 
proaches  to  the  development  were  discussed. 


Conclusions 
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Based  on  the  above  results,  the  following  conclusions 
are  made: 

1.  Both  of  the  application  programs  tuned  involved 
floating  point  operations.  Although  two  programs  are 
hardly  a  large  enough  sample  on  which  to  base  any  hard 
conclusions,  the  inference  here  is  that  the  HP  21MX  appli¬ 
cation  programs  in  greatest  need  of  performance  improvement 
are  those  with  floating  point  operations.  Ironically, 
these  are  the  ones  that  can  be  helped  the  least  with  mi¬ 
croprogramming  on  the  HP  21MX-M  Series  or  21MX-E  Series 
computers.  The  21MX-F  Series,  however,  has  floating  point 
operations  implemented  in  hardware,  and  programs  with  a 
large  number  of  floating  point  operations  running  on  this 
series  should  benefit  as  well  from  microprogramming  as 
programs  with  non-floating  point  operations. 

2.  Microcode  can  be  structured  and  well  documented 
using  software  engineering  techniques.  The  complexity  of  a 
program,  however,  can  increase  significantly  with  the  ad¬ 
dition  of  microcode.  The  laser  materials  code  segment,  for 
example,  was  changed  from  a  simple  10-statement  FORTRAN 
"DO"  loop  to  a  "CALL"  to  a  38-statement  assembly  language 
routine,  which  invoked  a  256-word  microprogram!  All  this 
was  done  for  a  10  per  cent  increase  in  speed!  In  this 
particular  case  the  trade-off  of  speed  versus  complexity 
could  not  be  justified,  because  the  increase  in  complexity 
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greatly  outweighed  the  overall  benefit  of  the  increase  in 
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speed.  In  the  wind  tunnel  program,  however,  the  routine 
was  already  at  the  assembly  language  level.  Thus,  much  of 
the  complexity  already  existed,  and  substituting  the  code 
to  invoke  the  microprogram  actually  decreased  the  assembly 
language  code.  The  microprogram  was  about  one-half  as  long 
as  the  laser  materials  microprogram,  and  the  speed  increase 
was  three  times  as  much.  The  trade-off  in  this  case  was 
justifiable. 

3.  The  manual  tuning  technique  used  in  this  thesis 
study  is  too  cumbersome  to  become  widely  used  at  Wright- 
Patterson  (or  anywhere  else).  To  use  this  technique  a 
programmer  must  learn  the  HP  assembly  language,  the  micro¬ 
assembly  language,  their  associated  debugging  tools,  and 
the  internal  architecture  of  the  system.  The  training 
time,  along  with  application  program  analysis  and  micro¬ 
program  development  time  represents  a  large  investment  with 
little  guarantee  of  results. 
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Recommendations 

The  following  recommendations  are  made  as  a  result  of 
this  thesis  investigation: 

1.  Since  the  application  programs  of  this  study  both 
involved  floating  point  operations,  further  study  could 
focus  on  other  types  of  applications  where  microprogramming 
might  be  of  better  benefit.  Two  possibilities  are  high¬ 
speed  sorting  and  high-speed  graphics.  One  ASD/ENAMA 
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program  called  MPASS  (Refs.  26,36)  could  possibly  use  both 
of  these  capabilities.  The  program  was  not  considered  in 
this  study  because  it  had  not  been  moved  from  the  CDC  com¬ 
puter  to  the  HP  21MX.  There  are  still  no  plans  to  move  the 
program,  but  a  high-speed  sorting  and  graphics  capability 
might  seriously  influence  such  a  move. 

2.  Users  with  programs  bound  by  floating  point  oper¬ 
ations  should  consider  upgrading  to  an  HP  21MX-F  Series 
machine.  The  hardware  floating  point  operations  of  the  F- 
Series  machine  are  roughly  20  times  as  fast  as  the  micro¬ 
programmed  functions  of  the  M-Series  machine  (Refs.  34:3-25 
and  35:13-20).  A  letter  received  from  the  Hewlett-Packard 
Company  indicates  that  no  hardware  floating  point  proces¬ 
sors  are  available  for  the  M-Series  machine  from  either 
them  or  any  other  known  source  (Ref.  37). 

3.  Because  of  the  drawbacks  of  the  manual  tuning 
technique  used  in  this  study,  it  is  recommended  that  an 
automated  system,  as  described  in  the  previous  chapter,  be 
designed  and  implemented  on  the  AFIT  HP  21MX  computer.  In 
support  of  this  effort,  the  upgrading  to  an  F-Series  should 
be  seriously  considered  because  of  the  floating  point  and 
microprogramming  limitations  of  the  M-Series  machine.  The 
initial  work  could,  however,  be  done  on  the  M-Series.  An 
automated  system  would  make  tuning  of  application  programs 
practical  for  all  HP  programmers.  Without  such  a  system, 
user-microprogramming  has  little  future  on  this  machine. 
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Microprogramming  Concepts 


Background 

Microprogramming  is  a  lower  level  of  computer  pro¬ 
gramming  (Ref.  27:Chapt.  2).  Program  instructions  written 
in  higher  level  languages  (HLL)  such  as  FORTRAN  are  first 
translated  or  compiled  into  machine-dependent  machine  lan¬ 
guage  instructions  (macroinstructions)  in  non-real  time. 
Each  macroinstruction  is  then  translated  or  mapped  ( inter¬ 
preted)  into  one  or  more  microinstructions  at  the  time  of 
program  execution.  This  instruction  hierarchy  is  illus¬ 
trated  in  Figure  17. 

Figure  18  shows  an  example  of  a  microprogrammed  com¬ 
putet'  architecture.  The  execution  of  a  program  begins  with 
the  fetching  of  the  first  macroinstruction  of  the  program 
from  main  memory.  The  operation  code  (opcode)  of  the  mac¬ 
roinstruction  points  indirectly  to  the  control  store  (mi¬ 
croprogram  memory)  location  of  its  corresponding  microrou¬ 
tine.  The  microinstructions  are  sequentially  fetched  from 
the  control  store  and  executed,  activating  the  various 
hardware  register  transfer  control  points,  and  ultimately 
causing  the  computer  to  perform  the  operation  specified  in 
the  original  higher  level  language  instruction.  The  next 
macroinstruction  is  then  fetched,  and  the  process  continues 
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Figure  17.  Program  Instruction  Hierarchy 
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cuted.  Each  macroinstruction  is  essentially  a  call  to  a 
microroutine.  The  opcode  of  the  macroinstruction  indicates 
the  "name"  (address)  of  the  microroutine,  and  the  other 
fields  of  the  macroinstruction  such  as  address  and  register 
fields  serve  as  the  parameters  to  be  passed  to  the  micro¬ 
routine. 

Microprogramming  requires  much  greater  attention  to 
detail  than  programming  in  a  higher  level  language  or  as¬ 
sembly  language,  because  of  the  number  of  lower-level  op¬ 
erations  and  the  timing  of  those  operations.  The  micro¬ 
programmer  must  be  concerned  with  transfers  between  buses, 
registers,  and  main  memory,  and  the  operations  of  the 
arithmetic  logic  unit.  These  transfers  and  ALU  operations 
are  specified  in  the  fields  of  the  microinstruction.  These 
fields  are  called  micro-orders  or  micro-operations. 

Consider,  for  example,  the  simple  problem  of  incre¬ 
menting  a  variable  called  A.  In  a  higher  level  language 
this  can  be  done  with  one  instruction,  such  as  A=A+1.  In 
assembly  language  this  problem  may  require  three  instruc¬ 
tions  —  an  instruction  to  load  the  value  A  into  an  accu¬ 
mulator  register,  an  instruction  to  increment  the  accumu¬ 
lator,  and  an  instruction  to  store  the  new  value  of  A  back 
into  its  memory  location.  In  microcode  the  problem  re¬ 
quires  several  microinstructions,  each  microinstruction 
comprisi  several  fields  or  micro-orders.  The  required 
micro-orders  may  be  as  follows: 
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1)  Move  the  address  of  A  from  the  instruction  register 
to  a  data  bus. 

2)  Move  the  address  from  the  bus  into  a  memory  address 
register. 

3)  Read  the  value  A  from  its  memory  location  into  a 
memory  data  register. 

4)  Move  the  contents  of  the  memory  data  register  to 
the  data  bus. 

5)  Move  the  value  of  the  data  bus  to  the  arithmetic 
logic  unit  ( ALU ) . 

6)  Perform  an  increment  operation  in  the  ALU. 

7)  Move  the  result  from  the  ALU  back  to  the  data  bus. 

8)  Move  the  result  from  the  bus  back  to  the  memory 
data  register. 

9)  Write  the  new  value  of  A  back  into  its  memory 
location. 

The  number  of  microinstructions  needed  to  perform 
these  nine  micro-orders  is  dependent  on  the  architecture  of 
the  particular  machine.  Three  or  four  microinstructions  is 
a  realistic  number.  For  example,  micro-orders  1  through  3 
may  make  up  the  fields  for  one  microinstruction,  4  through 
6  the  second,  and  7  through  9  the  third  microinstruction. 


How  Microprogramming  Improves  Speed 

Since  higher  level  language  instructions  end  up  as 
microinstructions  anyway,  it  is  not  apparent  that  directly 
microprogramming  all  or  part  of  a  program  would  have  any 
effect  on  its  execution  speed.  There  are  hidden  factors, 
however,  which  do  have  an  effect.  Some  of  the  most  impor¬ 
tant  factors  are  instruction  fetch  time,  memory  speed, 
parallelism,  and  other  additional  micro-level  capabilities. 

The  time  required  to  fetch  an  instruction  from  memory 
represents  35  to  45%  (Ref.  7:11)  of  the  total  execution 
time  of  an  instruction.  As  shown  in  Figure  17,  each  mi¬ 


croinstruction  of  the  computer's  basic  instruction  set 


concludes  with  a  jump  to  a  macroinstruction  fetch  routine. 
By  combining  the  microroutines  generated  by  two  or  more 
macroinstructions,  the  instruction  fetches  between  micro¬ 
routines  are  eliminated.  Combining  microroutines  essen¬ 
tially  creates  a  new  macroinstruction  with  the  power  of 
several  of  the  original  macroinstructions,  and  only  one 
instruction  fetch  is  required.  The  elimination  of  the  ex¬ 
tra  instruction  fetches  can  significantly  improve  the  exe¬ 
cution  speed. 

Microinstructions  also  must  be  fetched  from  memory. 
The  fetch  of  a  microinstruction  is,  however,  significantly 
faster  than  that  of  a  macroinstruction.  Because  the  con¬ 
trol  store  is  much  smaller  than  a  computer's  main  memory, 
faster  and  more  expensive  memory  components  can  be  used. 
This  memory  is  typically  two  to  five  times  faster  than  main 
memory  (Ref.  7:11). 

Parallelism  is  also  an  important  factor  in  improving 
execution  speed.  Since  a  microinstruction  is  made  up  of 
several  microorders,  independent  parallel  operations  can  be 
specified  in  one  microinstruction.  An  example  of  this 
concept  is  the  performance  of  an  arithmetic  operation  and  a 
memory  operation  at  the  same  time.  Parallel  operations  can 
provide  additional  gains  in  speed.  The  number  of  parallel 
operations  is,  however,  highly  dependent  on  the  number  of 
fields  in  the  microword  (the  width  of  the  microword)  and 
the  algorithm  being  microprogrammed. 

The  additional  capabilities  at  the  micro-level  can 


also  contribute  to  an  increase  in  execution  speed.  Addi¬ 
tional  registers  allow  the  storage  of  constants,  frequently 
used  operands,  and  intermediate  results.  This  additional 
storage  can  often  be  used  to  eliminate  time-consuming  ref¬ 
erences  to  main  memory.  The  direct  testing  of  flags  and 
direct  shift  control  can  also  be  used  in  some  applications 
to  improve  speed. 

Combining  all  of  these  factors  can  provide  speed  gains 


many  times  that  of  assembly  language.  Realistic  values  are 
between  2  and  20  times  (Ref.  9:49). 
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Glossary  of  Terms  Used  in  this  Report  (Ref.  7:2) 

Arithmetic  Logic  Unit  —  Part  of  the  computer's  hardware 
which  performs  arithmetic,  logic,  and  other  operations. 

Assembly  Language  --  Computer-dependent  machine  language 
which  is  the  base  instruction  set.  In  a  microprogrammed 
computer,  each  Assembly  language  instruction  is  implemented 
by  a  specific  microprogram. 

Control  Processor  —  The  section  of  the  computer  which 
determines  what  the  computer  is  to  do  for  each  machine 
instruction. 

Control  Store  —  The  memory,  used  by  the  Control 
Processor,  in  which  microprograms  reside.  It  may  be 

implemented  with  ROM,  PROM,  and/or  WCS. 

Fields  —  Microinstructions  are  divided  into  several 
parts,  known  as  fields.  Each  field  specifies  different 
micro-operations,  which  may  be  independent  of  one  another. 

Machine  Instructions  --  The  binary-coded  bit  patterns 
that  actually  control  the  operations  of  the  computer  via 
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the  control  Processors.  Programs  written  in  symbolic 
languages,  such  as  FORTRAN,  are  translated  to  machine 
instructions  by  Compilers,  Assemblers,  or  Interpreters. 


Microcode  —  Another  name  for  the  microinstructions  that 
make  up  a  microprogram,  either  in  source  language  or  in 
object  code  form. 

Microinstruction  —  One  instruction  of  a  microprogram, 
typically  made  up  of  one  or  more  micro-orders. 


I 
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Micro-order  —  A  complete  operation,  such  as  loading  a 
register  or  setting  a  register  equal  to  the  product  of  two 
other  registers.  Depending  upon  the  control  processor, 
more  or  less  than  one  micro-order  can  be  specified  by  a 
microinstruction. 


Microprogram  —  A  program  written  for  a  microprogrammed 
computer  at  the  control  processor  level  to  control  the 
computer.  In  a  totally  microprogrammed  computer,  every 

machine  instruction  is  implemented  by  a  microprogram. 

Microprogramming  —  The  process  of  developing 
microprograms  for  control  of  a  microprogrammed  computer. 

PROMs  (Programmable  Read-Only  Memory)  and  ROMs  (Read-Only 
Memory)  —  Are  components  used  to  store  microprograms  in 
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control  store.  Once  programmed,  they  cannot  be  altered. 
ROMs  differ  from  PROMs  in  that  ROMs  have  their 
microprograms  installed  when  they  are  manufactured  while 
PROMS  are  programmed  after  they  have  been  made. 

WCS  (Writable  Control  Store )  —  Control  store  implemented 

with  Random  Access  Memory  so  that  the  user  can  dynamically 
alter  its  contents. 
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The  following  is  a  list  of  HP  users  at  Wright- 
Patterson  Air  Force  Base: 


Jim  Leonard 

AFWAL/AARF-2 

55987 

Bldg 

23 

Ken  Greer 

AFWAL/AARF-2 

55987 

Bldg 

23 

Jeff  Barnes 

AFWAL/AARF-2 

55987 

Bldg 

23 

Al  Bowling 

AFWAL/AARF-2 

55987 

Bldg 

23 

Ralph  Pinney 

AFWAL/AARF-2 

55987 

Bldg 

23 

Lloyd  Clark 

AFWAL/AARF-4 

53050 

Bldg 

23 

Glenn  Williams 

AFWAL./FIMN 

52493 

Bldg 

26/240 

Bob  Ballard 

AFWAL/FIMN 

52493 

Bldg 

26/240 

Frank  Gondolfi 

AFWAL/AAWP 

55076 

Bldg 

821 

Bryan  Kent 

AFWAL/AAWP 

55076 

Bldg 

821 

Conrad  Phillippi 

AFWAL/MLPJ 

52334 

Bldg 

651 

John  Bankovskis 

AFWAL/AARI 

56  361 

Bldg 

622 

Carl  Williams 

AFWAL/FIEE 

56078 

Bldg 

45/93 

Mike  Fabian 

AFWAL/FIEE 

56078 

Bldg 

45/93 

John  Warner 

AFWAL/FIEE 

56078 

Bldg 

45/93 

Bill  Griffin 

ASD/ENAMA 

55153 

Bldg 

125 

John  Steidle 

A3D/ENAMA 

55153 

Bldg 

125 

Russ  Soerens 

ASD/ENAMA 

55153 

Bldg 

125 

Larry  Linder 

AFFDL 

55205 

Bldg 

192 

l 
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Appendix  D 


This  appendix  contains  listings  for  the  activity 
profile  generator  program  (AC.TV)  and  its  subroutines  (RCORE 
and  IDGET).  The  original  program  was  written  by  Jim 
Leonard,  AFWAL/AARF-2,  WPAFB.  The  program  was  modified 
slightly  during  this  thesis  study  to  run  on  the  AFIT  BP 
21MX  computer. 


o  o  o  o  o 


FTN4,L,B 

PROGRAM  ACTV 

DIMENSION  FBF(52) , IPR ( 5  2 ) ,IN(3) 

ACTIVITY  PROFILE  GENERATOR  USING  SUSPEND  ADDRESS 
FROM  ID-SEGMENT. 

BY  JIM  LEONARD 
USAF  AVIONICS  LAB,  WPAFB 

5  WRITE( 1,10) 

10  FORMAT ( "  ACTIVITY  PROFILE  GENERATOR  ",//, 

1  "  TYPE  PROG  NAME  "  ) 

IN ( 1 ) =2H 
INC  2)=2H 
INC  3)  =  2H 
READ (1,20)  IN 
20  FORMAT ( 3 A2) 

C  GET  ADDRESS  OF  ID  SEGMENT 

IDSEG=IDGET( IN) 

IF  (IDSEG.NS.O)  GOTO  100 
WRITEC 1, 30 ) IDSEG 

30  FORMAT ( "IMPROPER  PROGRAM  NAME,  IDSEG=  ",I8) 

GOTO  5 

100  WRITE (1,110) 

110  FORMATC "TYPE  PROFILE  BOUNDS,  LOWER-UPPER  &  INTERRUPT 
1  TIME ( 1-9 )",/, "  XXXXX  XXXXX  X") 

NT=0 

READ(1,120)IL,IU,NT 
120  FORMAT (2K6, 16) 

IFCNT.LE. 0)NT=3 

C  INITIALIZE  PROFILE  BUFFER 

DO  130  1=1,52 
FBF ( I ) =0 . 

130  CONTINUE 

ID=IU-IL+1 

INCR=  (IU-IL  +  D/  50 

IF ( INCR*50. LT. ID) INCR-INCR+1 

IW1=IDSEG+15 

IW2=IDSEG+8 

C  IF  PROGRAM  IS  NOT  CURRENTLY  ACTLVE  DON'T  RECORD  LOCATION 
300  CALL  RCOREC IWl, IVAL) 

IF ( IAND C IVAL, 15B) . NE. 1 ) GOTO  200 

C  READ  SUSPENDED  LOCATION 

CALL  RCOREC IW2, IVAL) 

C  CHECK  FOR  BEFORE  BOUNDS 

IF  ( IVAL.GE.  IDGOTO  140 
FBF ( 1 ) =FBF ( 1 ) +1 . 

GOTO  200 

C  CHECK  FOR  BEYOND  BOUNDS 

140  IF ( IVAL . LE. IU ) GOTO  150 
FBF ( 52 )=FBF( 52 )+l 
GOTO  200 

C  MARK  INTERVAL 

150  IVAL=(IVAL-IL)/lNCR+2 
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FBF ( IVAL ) =FBF ( IVAL ) +1 . 

C  TERMINATE  MONITORING  IF  OPERATOR  BREAKS 

200  IF ( IFBRK ( IDMY ) )  500 , 210 
210  ISC=0 

C  WAIT  DESIRED  INTERVAL 

CALL  EXEC(12, ISC/1,0, -NT) 

GOTO  300 

500  WRITE ( 10,510)IN,IL,IU,  I NCR 

510  FORMAT ( "  PROGRAM  ACTIVITY  PROFILE  FOR  ",3A2,/, 

1  "  FROM" ,  K8 , "  TO" , K8 , "  IN  INCREMENTS  OF", 18) 

515  FORMAT ( "  INTERVAL  NO.  FROM  TO  NO  OF  HITS 
1  , "NORMALIZED  HITS  NORMAL  ACCUM") 

C  FIND  MAX  VALUE  OF  HISTOGRAM 

FMX=-1 
TSUM=0 

DO  520  1=2,51 
TSUM=TSUM+FBF ( I ) 

IF ( FMX. LT . FBF ( I ) )FMX=FBF(I) 

520  CONTINUE 

C  EXIT  IF  NO  ACTIVITY  IN  DESIRED  RANGE 

I F ( FMX . GT . 0 ) GOTO  600 
IF ( ( FBF ( 1 ) +FBF { 52 ) ) .GT.O)GOTO  540 
WRITE ( 1 , 530 ) 

530  FORMAT ( "NO  PROGRAM  ACTIVITY  RECORDED — AT  ALL!!!") 
WRITE( 10,530) 

STOP 

540  WRITE ( 1,550) FBF ( 1 ) , FBF ( 52 ) 

550  FORMAT (  "NO  PROGRAM  ACTIVITY  IN  REGION  OF  INTEREST",/ 

1  , "BEF0RE=",E13. 7, "  AFTER="El3 . 7 ) 

WRITE ( 10,550)FBF(1) ,FBF{ 52) 

C  WRITE  TABLE  OF  ACTIVITY  PROFILE 

600  WRITE (10,515) 

SUM=0. 

TSM1=TSUM+FBF ( 1 ) +FBF ( 52 ) 

DO  650  1=1,52 
SUM=SUM+FBF ( I ) /TSMl 
FNORM=FBF(  D/FMX 
IFR=IL+INCR*(I-2) 

ITO=IFR+INCR 
IF(I.EQ. 1) 1FR=0 
IF ( I . EQ. 52 ) ITO=3?767 

WRITE ( 10 ,610)1, IFR, ITO, FBF ( I ) , FNORM, SUM 
610  FORMAT (4X,I3,6X, 2K7 , F10 . 0 , F17 . 8,F15. 5) 

650  CONTINUE 

C  PLOT  HISTOGRAM  ON  PRINTER 

WRITE ( 10,510)IN,IL,IU, I NCR 
WRITE! 10,700) 

700  FORMAT!"  INTERVAL  0246" 

1  "  8  1") 

C  FOR  EACH  DATA  INTERVAL 

SUM=-FBF(1)/TSUM 
DO  800  J=1 , 52 

•*’-  C  CLEAR  PRINTER  BUFFER 
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DO  710  1=1,51 
IPR( I ) =2H 
710  CONTINUE 
C  CALCULATE  INDEXS 

SUM=SUM+FBF ( J ) /TSUM 
INDX=SUM*50.+1. 5 

IF( ( J. NE. 1) .AND. ( J.NE. 52) ) IPR ( INDX ) =2HI I 
INORM=50 . *FBF ( J ) /FMX+1 . 5 
C  PRINT  AN  X  IF  OFF  PLOT 

IF ( INORM. LT . 1 ) INORM=-l 
IF ( INORM. GT. 51 ) INORM=-51 
C  PRINT  AN  *  IF  ON  THE  PLOT 

IF ( INORM. GT. 0) IPR (INORM) =2H00 
IF ( INORM.LT . 0 ) IPR ( -INORM ) =2HXX 
WRITE ( 10,720)J, (IPR(K) ,K=1, 51) 

720  FORMAT ( 2X, 1 6 , 3X, 5lAl ) 

800  CONTINUE 
STOP 
END 
END? 


r§r* 
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NAM  RCORE 

* 

*  READS  AND  RETURNS  THE  CONTENTS  OF  A  SINGLE 

*  MEMORY  LOCATION. 

* 

*  THIS  IS  A  SUBROUTINE  TO  THE  ACTIVITY  PROFILE 

*  GENERATOR  JIM  .EONARD  WROTE. 

* 

*  THE  ACTIVITY  PROFILE  SOURCE  PROGRAM  IS  ON  FILE  &ACTV : 

* 

*  JOHN  STEIDLE 

* 

ENT  RCORE 
EXT  . ENTR 
IWl  NOP 

IW2  NOP 

RCORE  NOP 

JSB  .ENTR 
DEF  IWl 
LDA  IWl, I 
LDA  0,1 
STA  IW2 , I 
JMP  RCORE, I 


ADDRESS  OF  ADDRESSES  OF  DESIRED  VALUE 
ADDRESS  FOR  RETURNED  CORE  VALUE 

GET  PARAMETER  ADDRESSES 

READ  ADDRESS 
READ  CONTENTS 
STORE  IT 


FTN4 ,  L 

INTEGER  FUNCTION  IDGET(IN) 

DIMENSION  INC  3) 

C  FUNCTION  IDGET  FINDS  THE  ADDRESS  OF  THE  ID  SEGMENT 
C  OF  THE  PROGRAM  NAME  PASSED  BY  THE  CHARACTER  ARRAY 
C  "IN".  THIS  FUNCTION  IS  PERFORMED  BY  SEQUENTIALLY 
C  SEARCHING  THROUGH  THE  ID  SEGMENTS  OF  THE  SYSTEM 
C  LOOKING  FOR  A  MATCH  ON  THE  INPUT  PROGRAM  NAME.  WHEN 
C  THE  CORRECT  ID  SEGMENT  IS  FOUND,  THE  ADDRESS  OF  THE 
C  SEGMENT  IS  PASSED  BACK  IN  "IDGET".  IF  THE  SEGMENT 
C  IS  NOT  FOUND,  "IDGET"  IS  SET  TO  ZERO. 

C 

C 

C  GET  ADDRESS  OF  ID  SEGMENT  ADDRESS  TABLE  IN  LOC  1657  OCTAL 
IPTR1=1657B 

CALL  RCOREC IPTRl, IPTR2 ) 

C  LOOP  TO  SEARCH  THROUGH  ID  SEGMENT  TABLES 
900  CALL  RCOREC IPTR2, IDGET) 

IF  (IDGET  . EQ.  0)  GOTO  950 
IPTR2=IPTR2+1 

C  POINT  TO  NAME  AREA  OF  TABLE  AND  COMPARE  THE  3  WORDS 
C  CONTAINING  THE  5  CHARACTER  PROGRAM  NAME 
IPTR1=IDGET+12 
CALL  RCOREC IPTRl, INAME) 

IF  (INAME  .NE.  INC1))  GOTO  900 

1PTR1=IPTR1+1 

CALL  RCOREC IPTRl, INAME) 

IF  (INAME  .NE.  INC  2))  GOTO  900 

IPTRl=IPTR 1+1 

CALL  RCOREC IPTRl, INAME) 

C  COMPARE  5TH  CHAR  IN  UPPER  BYTE  OF  THE  WORD. 

C  IGNORE  THE  LOWER  BYTE. 

IF  ( IABS ( INAME-IN ( 3 ) )  .GT.  255)  GOTO  900 
950  RETURN 
END 
END$ 
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Appendix  E 


The  following  are  instructions  for  running  ACTV  on  the 
AFIT  RTE-III  system: 

1.  ACTV  and  the  program  to  be  tested  must  be  compiled 
and  loaded  prior  to  running  ACTV.  If  the  core  image  files 
already  exist  (saved  from  a  previous  session),  type  the 
following  commands: 

RP , ACTV 

RP,test  program  name 

EX 

Any  key  to  get  the  *  prompt 

If  the  relocatable  object  files  (the  %  files)  for  ACTV  or 
the  test  program  have  been  loaded  during  the  current 
session,  the  corresponding  RP  command  can  be  ommitted. 

2.  Set  the  priority  of  ACTV  to  89  by  the  following 
command : 

PR, ACTV, 89 

Any  key  to  get  the  *  prompt 

3.  Run  ACTV  by  typing: 

RU , ACTV 

4.  ACTV  responds: 

ACTIVITY  PROFILE  GENERATOR 

TYPE  PROG  NAME 

5.  Enter  the  5-character  program  name. 

6.  ACTV  responds: 


TYPE  PROFILE  BOUNDS,  LOWER-UPPER  &  INTERRUPT 
XXXXX  XXXXX  X  TIME  (1-9) 

7.  Enter  the  octal  address  bounds  of  the  program 
region  ACTV  is  to  monitor  and  the  rate  at  which  the  program 
is  to  be  interrupted.  The  smaller  the  interrupt  time 
number,  the  greater  the  interrupt  rate  and  total  "hits"  for 
the  profile.  Both  the  addresses  and  interrupt  rate  must  be 
entered  directly  below  the  Xs. 

8.  Press  any  key  to  get  the  *  prompt.  Run  the 
program  under  test  by: 

RU, prog ram  name 

9.  The  program  will  execute  normally.  When  it 
terminates,  type: 

BR , ACTV 

This  will  terminate  ACTV,  and  the  activity  profile  will  be 
printed  on  the  printer. 


Appendix  F 


This  appendix  contains  listings  for  SDRVR,  STRES,  and 
SPEED.  SDRVR  is  a  special  driver  program,  which  was 
written  by  AFWAL/FIMN  personnel,  to  test  the  wind  tunnel 
routines  SDRVR  and  SPEED  on  the  AFIT  21MX  computer.  SDRVR 
and  SPEED  are  the  original  subroutines  used  in  the  wind 
tunnel  control  program.  The  routines  here  are  presented 
essentially  as  they  were  received  from  the  user.  They  are 
not  well  commented,  and  no  attempt  was  made  to  improve 


12  JUL  82 
PROGRAM  SDRVR 


COMMON  DDFL (13,13), IZTHM ( 1 )  , CONI ( 1 )  , CON6 ( 1 ) 
DIMENSION  IBXNPS (10,18), IBXJAK ( 8 ) , 

*  STRESS (14) , YZT ( 14 ) , YZTl ( 10 ) 

REAL  MOM (14) 

DIMENSION  DFL (13,13), DFL2 (13,4), DFL3 (13,3) 

EQUIVALENCE  ( DFL ( 1 , 7 ) , DFL2 ) , (DFL (1, 11) , DFL 3 ) , 

1  ( YZTl, YZT( 5 ) ) 

DATA  DFL/ 

1  . 2528E4,-. 1750E4, . 5417E3, 4790E2, . 123 3E2, 

1  2884E1, . 754 5E0,-. 2028E0, . 5681E-1,-. 1703E-1, 

1  . 4409E-2,-.1102E-2, . 1837 E- 3, 

2  1750E4, . 1968E4,-. 9478E3, .1805E3,-. 4645E2, 

2  . 1087E2,-. 2843E1, . 7644E0,-. 2141E0, . 6416E-1, 

2  1662E-1, . 4154E-2, . 6923E-3, 

3  . 5417E3, 9478E3, . 7798E3, 3674E3, .1297E3, 

3  3035E2, . 7941E1,-.2135E1, . 5979E0,-. 1792E0, 

3  . 4640E-1,-. 1160E-1, . 1933E-2, 

4  4790E2, . 1805E3,-.3674E3, . 4015E3, 2481E3, 

4  .  9002E2 ,  2355E2,  .  6331E.I,  1773E1,  .  5315E0, 

4  -.1376E0,.344lE-l,-.5734E-2, 

5  . 1233E2,-.4646E2, . 1297E3 , 2481E3 , . 2719E3 , 

5  - . 1828E3 , . 8358E2,-.2247E2,.6293El,-.1886El, 

5  . 4884 E0, 1221E0, . 2035E-1, 

6  2886E1, . 1087E2 , 3035E2, . 9002E2,-. 1828E3, 

6  . 3075E3,-. 2823E3, .1136E3,-. 3181E2, .9533 El, 

6  2469E1, . 6172E0,-. 102 9E0/ 

DATA  DFL 2/ 

7  . 7568E0,-. 2845E1, . 7942E1,-. 2355E2, . 8358E2, 

7  2823E3, . 3976E3, 2682E3, . 1144E3, 3427E2, 

7  .8876E1,-. 2219E1, . 3698E0, 

8  2045E0, . 7653E0,-. 2135E1, . 6331E1,-. 2247E2, 

8  . 1136E3,-. 268 2E3, . 3449 E3,-. 2705E3, . 1231E3, 

8  -.3187E2,.7968El,-.1328El, 

9  . 5761E-1, 2149E0, . 5976E0,-. 1772E1, . 6292E1, 

9  3181E2, . 1144E3, 2705E3, . 4069E3,-. 3179E3, 

9  . 1186E3, 2965E2, . 4942E1, 

*  1772E-1, . 6570E-1,-.1800E0, .5307E0,-. 188 5E1, 

*  . 9532E1,-. 3427E2  , . 1231E3 , 3179E3 , . 4359E3 , 

*  -.3609E3,.1752E3,-.2921E2/ 

DATA  DFL 3/ 

1  . 5690E-2 , 1935E-1, . 4937E-1,-. 1388E0, . 4870E0, 

1  2465E1, . 8871E1  , 3187E2, . 1186E3, 3609E3, 

1  . 624lE3,-.496lE3, .1394E3, 

2  2199E-2, .6440E-2,-.1477E-l, . 3634E-1 , 1211E0 
2  . 6138E0 , 2215E1, .7965E1,-. 2965E2, . 1752E3, 

2  -.4961E3, . 5492E3,-. 2049E3 
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3  .4995 E- 3,-. 1367 E- 2, .  30  5  3E-2, 6550E-2, .  2009E-1 
3  1017E0, . 368 5E0,-. 1327E1, . 4941E1,-. 2921E2, 

3  . 1394E3,-. 2049E3, . 9083E2/ 


C  LINES 
C 


ADDED 


REPLACE  READS 


DEVICES 


AVAILABLE 


DATA  I3XNPS/40*0, 10*2423, 2024, 2064, 2137,2234, 2302, 
'  2153,2021,1850,1686,1524/ 


IBXJAK ( 1 ) 
IBXJAK( 2 ) 
IBXJAK ( 3 ) 
IBXJAK ( 4 ) 
IBXJAK*  5) 
IBXJAK ( 6 ) 
IBXJAK ( 7 ) 
IBXJAK( 8 ) 


2172 

2365 

2422 

2000 

2000 

2000 

0 

0 


DO  35  1=1,169 

DDFL(I)  =  DFL(I) 
CONTINUE 


2000 


IZTHM=2000 
CONl=5E-4 
CON6=5. 4932E-4 

DO  2000  IROD=5 , 6 

CALL  STRES ( 1 , I ROD , IBXJAK , IBXNPS ( 1 , I ROD ) , STRESS , MOM ) 
IBXJAK ( 1 )  =  2000 
IBXJAK ( 2 )  =  2000 
IBXJAK ( 3 )  =  2000 
CONTINUE 
END 
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o  o  o  o  o  n  o 


SUBROUTI NE  STR ES ( IFCN , KIROD , I BXJAK , I DATA , STRESS , MOM , 

IZXNPP) 


FCN=1  FOR  THUMWHEEL 
FCN=2  FOR  POTS 

LAST  PARAMETER  IS  NOT  REQUIRED  FOR  THUMWHEELS 


COMMON  DDFL(13, 13) , IZTHM ( 1 ) , CONI ( 1 ) , CON6 ( 1 ) 

DIMENSION  IBXJAK ( 8 ) , STRESS ( 14 ) , YZT( 14 ) , YZTl (10 ) , 
*IDATA ( 10 ) , IBXNPP ( 10 ) 

REAL  JACK (14) , LOAD < 14) , MOM (14) 

EQUIVALENCE  ( YZTl , YZT ( 5 ) ) 

DATA  JACK/0. , 6 . , 11 . , 16 . ,  21 . , 26 . , 31 . , 33 . 5 , 36 . , 38 . 5 , 41 . 
*44.69,48.38,52.07/ 

YZT ( 1 ) =0 

IF  (IFCN.EQ.3)  GO  TO  3000 
IF  (KIROD. GT. 9)  GO  TO  100 
YZT ( 2 ) =  (IBXJAK(l) -IZTHM)*  CONl 
YZT ( 3 ) =  ( IBXJAK ( 2 ) -IZTHM ) *  CONI 
YZT ( 4 ) =  ( IBXJAK ( 3 ) -IZTHM ) *  CONl 
GO  TO  200 

100  YZT ( 2 ) =  (IBXJAK(4)-IZTHM)*  CONl 
YZT ( 3 ) =  (IBXJAK(5)-IZTHM)*  CONl 
YZT ( 4 ) =  (IBXJAK(6)-IZTHM)*  CONl 
200  CONTINUE 

IF  (IFCN.EQ.2)  GO  TO  2000 
1000  DO  1050  1=1,10 

1050  YZTl ( I ) =  (IDATA(I)-IZTHM)*  CONl 

1500  CALL  SPEED(LOAD, MOM, STRESS, YZT, JACK) 

RETURN 

2000  DO  2050  1=1,10 

2050  YZTl ( I ) =  ( IDATA ( I ) -IZXNPP ( I ) ) *  CON6 
CALL  SPEED (LOAD, MOM, STRESS, YZT, JACK) 

RETURN 

3000  CALL  SPEED(LOAD, MOM, STRESS, YZT, JACK) 

RETURN 

END 


ASMB , L 


NAM 

SPEED, 

EXT 

.  ENTR 

ENT 

SPEED 

COM 

DDFLC 

LOAD 

BSS 

1 

MOM 

BSS 

1 

STRES 

BSS 

1 

YZT 

BSS 

1 

JACK 

BSS 

1 

SPEED 

NOP 

JSB 

.  ENTR 

DEF 

LOAD 

LDA 

LOAD 

INA 

INA 

STA 

.LODI 

STA 

.  LOD2 

STA 

.  LOD3 

STA 

.  LOD4 

LDA 

YZT 

INA 

INA 

STA 

.YZT 

LDA 

JACK 

INA 

INA 

STA 

.  JCKl 

STA 

.  JCK2 

STA 

.  JCK4 

LDA 

MOM 

STA 

.  MOMl 

STA 

.  .A 

INA 

INA 

STA 

.MOM  3 

LDA 

.  .  DFL 

STA 

.  DDFL 

*  COMPUTE  RESULTS  BASED  ON  DEFLECTIONS  ALONE 

*  FIND  LOADS 


LDA 

=D-13 

STA 

CNT2 

LOOP 2  LDA 

.YZT 

STA 

.  .YZT 

DLD 

.DDFL, 

OCT 

105040 

.YZT  BSS 

1 

DST 

•LODI, 

ISZ 

.  .YZT 

ISZ 

.  .YZT 

ISZ 

.  DDFL 

ISZ 

.  DDFL 

LDA 

=D-12 

STA 

CNTl 

FMP 
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LOOPl  DLD 

. DDFL , I 

OCT 

105040 

FMP 

. . YZT  BSS 

1 

OCT 

10500 

FAD 

.LODI  BSS 

1 

DST 

•LODI, I 

ISZ 

.  .YZT 

ISZ 

.  .YZT 

ISZ 

.DDFL 

ISZ 

.  DDFL 

ISZ 

CNTl 

JMP 

LOOPl 

ISZ 

.  LODI 

ISZ 

.LODI 

ISZ 

CNT2 

JMP 

LOOP  2 

FIND  MOMENT  DISTRIBUTION  AT  JACKS 


LDA 

=D-13 

STA 

CNTl 

CLA 

CLB 

LOOP  3 

OCT 

105020 

FSB 

.  LOD2 

BSS 

1 

ISZ 

.LOD  2 

ISZ 

.LOD  2 

ISZ 

CNTl 

JMP 

LOOP  3 

DST 

LOAD, I 

CLA 

CLB 

DST 

.MOMl. I 

LDA 

=D-13 

STA 

CNT2 

LOOP4 

DLD 

. JCKA, I 

OCT 

105040 

FMP 

.  LOD3 

BSS 

1 

OCT 

105000 

FAD 

.  MOMl 

BSS 

1 

DST 

•MOMl, 1 

ISZ 

.  JCK1 

ISZ 

.  JCKl 

ISZ 

.  LOD  3 

ISZ 

•  LOD  3 

ISZ 

CNT2 

JMP 

LOOP  4 

DLD 

LOAD, I 

DST 

TEMPI 

DLD 

MOM,  I 

DST 

TEMP  2 

LDA 

=D- 12 

STA 

CNTl 

LOOP  5 

DLD 

TEMPI 

OCT 

10500 

FAD 

.  LOD  4 

BSS 

1 

FAD 


DST  TEMPI 
DLD  . LOD  4 , 1 
OCT  105040  FMP 

. JCK4  BSS  1 

DST  TEMP  3 

DLD  TEMP 2 

OCT  105020  FSB 

DEF  TEMP  3 

DST  TEMP  2 

DLD  . J CK 4 , 1 

OCT  105040  FMP 

DEF  TEMPI 

OCT  105000  FAD 

DEF  TEMP 2 

DST  . MOM3 , I 

ISZ  . MOM 3 

ISZ  .MOM 3 

ISZ  . LOD 4 

ISZ  . LOD 4 

ISZ  . JCK4 

ISZ  . JCK4 

ISZ  CNTl 

JMP  LOOP 5 

*  STRESS  AT  JACK  CENTERLINE  AND  WALL 

LDA  .MP 
STA  . . M 
LDA  STRES 
STA  . . S 
LDA  =D-13 
STA  CNT2 
LOOP 7  DLD  ..A, I 
ISZ  ..A 
ISZ  ..A 
SSA 

CMA, INA 

OCT  105040  FMP 
. .M  BSS  1 

DST  .  .S,I 
ISZ  ..S 
ISZ  ..S 
ISZ  ..M 
ISZ  ..M 
ISZ  CNT2 
JMP  LOOP 7 
JMP  SPEED, I 
. . DFL  BSS  1 
. JCKl  BSS  1 
. JCK2  BSS  1 
. JCK3  BSS  1 
. MOM 3  BSS  1 
CNTl  BSS  1 
CNT2  BSS  1 
. .A  BSS  1 
..S  BSS  1 
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.MP  D£F  MP 

MP  DEC  148.63,148.63,148.63 
DEC  594.5,594.5,594.5 
DEC  2378. , 2378. , 2378. , 2378. 
DEC  2378. , 594. 5, 594. 5 
TEMPI  8SS  2 
TEMP 2  BSS  2 
TEMP 3  BSS  2 

END  SPEED 
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Appendix  G 


This  appendix  contains  listings  for  MSPED  and  LOADS. 
MSPED  is  a  version  of  SPEED  modified  to  invoice  a 
microprogram  substitution  for  L00P1  and  L00P2  of  SPEED. 

Only  the  code  up  to  and  including  the  modifications  are 
shown  here.  The  remainder  of  the  code  is  as  in  the  SPEED 
listing  of  Appendix  F.  Also,  the  program  name  remains  as 
"SPEED"  to  minimize  changes  to  calling  routines.  Only  the 
file  names  are  changed  to  MSPED.  LOADS  is  the  microprogram 
subsitution  for  LOOPl  and  L00P2. 


ASMB,: 

L 

NAM 

SPEED, 7 

EXT 

.  ENTR 

ENT 

SPEED 

COM 

DDFL( 338 ) 

LOAD 

BSS 

1 

MOM 

BSS 

1 

STRES 

BSS 

1 

YZT 

BSS 

1 

JACK 

BSS 

1 

SPEED 

NOP 

JSB 

.ENTR 

DEF 

LOAD 

LDA 

LOAD 

INA 

INA 

STA 

.LODI 

STA 

.  LOD2 

STA 

.LOD3 

STA 

.  LOD4 

LDA 

YZT 

INA 

INA 

STA 

.YZT 

LDA 

JACK 

INA 

INA 

STA 

.  JCKl 

STA 

.  JCK2 

STA 

.  JCK4 

LDA 

MOM 

STA 

.  MOMl 

STA 

.  .A 

INA 

INA 

STA 

.MOM  3 

LDA 

.  .  DFL 

STA 

.  DDFL 

*  COMPUTE  RESULTS  BASED  ON  DEFLECTIONS  ALONE 

*  FIND  LOADS  BY  INVOKING  THE  LOADS  MICROPROGRAM 
LOADS  OCT  105600 

. DDFL  BSS  1 
. YZT  BSS  1 
.LODI  BSS  1 

*  FIND  MOMENT  DISTRIBUTION  AT  JACKS 

.  THE  CODE  HERE  IS  IDENTICAL  TO  SPEED 


END  SPEED 


r#r 


MICMX, L, R  21MX 

$CODE=%LOADS: : 20, REPLACE  OBJECT  TO  DISK 

ORG  6000B 

****************************************************** 
*  * 

*  LOADS  MICROPROGRAM  * 

*  * 


*  THIS  MICROPROGRAM  IS  A  SUBSTITUTE  FOR  THE  ASSEMBLY  * 

*  LANGUAGE  CODE  SEGMENT  LABELED  LOOP2  IN  THE  ROUTINE  * 

*  CALLED  SPEED.  WITH  THIS  ROUTINE  WRITTEN  INTO  WCS,  * 

*  THE  LOOP 2  CODE  SEGMENT  IN  SPEED  (FROM  THE  LABEL  * 

*  "LOOP 2"  TO  THE  "JMP  LOOP 2"  INSTRUCTION)  CAN  BE  * 

*  REPLACED  BY  THE  FOLLOWING  INSTRUCTIONS:  * 

*  * 


*  LOADS  OCT  105600 

*  . DDFL  BSS  1 

*  .YZT  BSS  1 

*  .LODI  BSS  1 


CALL  THE  LOADS  MICROPROGRAM 
ADDRESS  OF  THE  DDFL  ARRAY 
ADDRESS  OF  THE  YZT  ARRAY 
ADDRESS  OF  THE  LOAD  ARRAY 


* 

* 


* 

* 


*NOTE  THAT  .DDFL,  .YZT,  AND  .LODl  ARE  ALREADY  DEFINED* 

*  IN  SPEED.  THEY  MUST  BE  MOVED  TO  THE  LINES  FOLLOWING  * 

*THE  "LOADS  OCT  105600"  INSTRUCTION  AS  SHOWN  ABOVE.  * 
*THE  ORDER  IS  IMPORTANT  AS  THESE  ARE  PARAMETERS  FOR  * 
*THE  MICROPROGRAM.  * 

*  * 


FLD 

EQU 

%7031 

ROM 

FLT 

PNT  LOAD  ROUTINE 

PACK 

EQU 

%7052 

ROM 

FLT 

PNT  PACK  ROUTINE 

START 

JMP 

ORG  6002B 

LOADS 

JUMP 

USE 

TO  MAIN  MICROPROGRAM 
6001B  FOR  DEBUG  BKPNT 

****************************************************** 

*  READ  CALLING  PARAMETERS  FROM  MEMORY  AND  STORE  IN  * 

*  SCRATCH  REGISTERS:  .DDFL  — >  S3  * 

*  .YZT  — >  S12  * 

*  .LODI  — >  S8  * 

*  ALSO  INITIALIZE  OUTER  LOOP  COUNTER  REGISTER  X  TO  13* 

****************************************************** 


LOADS 


READ 

INC 

M 

P 

INC 

P 

P 

PASS 

S3 

TAB 

READ 

INC 

M 

P 

INC 

P 

P 

PASS 

S12 

TAB 

READ 

INC 

M 

P 

IMM 

CMLO 

X 

%  36 

PASS 

S8 

TAB 

READ  DDFL  ADDR  FROM  MEMORY 
POINT  TO  YZT  ADDRESS 
PUT  DDFL  ADDRESS  INTO  S3 
READ  YZT  ADDR  FROM  MEMORY 
POINT  TO  LOAD  ARRAY  ADDR 
PUT  YZT  ADDRESS  INTO  Sl2 
READ  LOAD  ADDR  FROM  MEMORY 
LOOP 2  CNTR=13(1'S  CMP  362) 
PUT  LOAD  ADDRESS  INTO  S8 


****************************************************** 

*  MATRIX  MULTIPLICATION  LOOP  --  THIS  CODE  SEGMENT  * 

*  PERFORMS  THE  FLOATING  POINT  MATRIX  MULTIPLICATION  * 

*  OF  THE  13X13  DDFL  MATRIX  BY  THE  14X1  YZT  MATRIX.  * 
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*  THE  FIRST  ELEMENT  OF  THE  YZT  MATRIX  IS  NOT  USED,  * 

*  MAKING  IT  EFFECTIVELY  A  13X1  MATRIX.  THE  RESULT  OF  * 

*  THE  MATRIX  MULTIPLICATION  IS  THE  14X1  MATRIX  CALLED* 

*  LOAD  (1ST  ELEMENT  AGAIN  NOT  USED).  * 

****************************************************** 


rr 


LOOP 2  IMM  CMLO  A  %377  PUT  A  ZERO  IN  A-REG 

MPCK  INC  M  S8  PUT  LOAD  ADDR  INTO  M-REG 

WRTE  PASS  TAB  A  CLEAR  WORDl  OF  LOAD  ELEMNT 

INC  B  S8  POINT  B  TO  WORD2  OF  LOAD 

MPCK  INC  MB  &  PUT  ADDR  INTO  M-REG 

WRTE  PASS  TAB  A  CLEAR  WORD 2  OF  LOAD  ELEMNT 

IMM  CMLO  Y  %362  LOOPl  CNTR=13(1'S  CMP  362) 

PASS  S4  S12  PUT  YZT  ADDR  INTO  S4 

LOOPl  READ  INC  M  S3  READ  1ST  WORD  DDFL  ELEMENT 

INC  S3  S3  POINT  TO  2ND  DDFL  WORD 

PASS  A  TAB  PUT  1ST  WORD  INTO  A-REG 

READ  INC  M  S3  READ  2ND  WORD  OF  DDFL 

INC  S3  S3  POINT  TO  NEXT  DDFL  ELEMENT 

PASS  B  TAB  PUT  2ND  WORD  INTO  B-REG 

****************************************************** 

*  " JMP "  RATHER  THAN  "JSB"  TO  FMPY  AND  FADD  ROUTINES  * 

*  BECAUSE  THESE  ROUTINES  WILL  DESTROY  TH  RETURN  * 

*  ADDRESS  BY  CALLING  OTHER  ROUTINES.  RETURN  IS  TO  THE* 

*  INSTRUCTIONS  LABELED  "RTNFMPY"  AND  "RTNFADD "  * 

****************************************************** 


JMP 

FMPY 

MULTIPLY  DDFL&YZT  ELEMENTS 

* 

PRODUCT  GOES  INTO  A/B  REGS 

RTNFMPY 

JMP 

FADD 

ADD  DDFL&YZT  PROD  TO  LOAD 

* 

SUM  GOES  INTO  A/B  REGS 

RTNFADD 

MPCK 

INC 

M 

S8 

PUT  LOAD  ADDR  INTO  M-REG 

WRTE 

PASS 

TAB 

A 

WRITE  A-REG  TO  LOAD  ADDR 

INC 

S8 

S8 

POINT  TO  2ND  WORD  OF  LOAD 

MPCK 

INC 

M 

S8 

PUT  ADDRESS  INTO  M-REG 

WRTE 

PASS 

TAB 

B 

WRITE  B-REG  TO  2ND  WORD 

DEC 

S8 

S8 

POINT  BACK  TO  1ST  WORD 

INC 

S4 

S4 

POINT  TO  NEXT  YZT  WORD 

INC 

S4 

S4 

DEC 

Y 

Y 

DECREMENT  LOOPl  COUNTER 

JMP 

CNDX 

TBZ 

RJS 

LOOPl 

IF  CNTR  NOT=0  GO  TO  LOOPl 

INC 

S8 

S8 

POINT  TO  NEXT  LOAD  ELEMENT 

INC 

S8 

S8 

DEC 

X 

X 

DECREMENT  LOOP2  COUNTER 

JMP 

CNDX 

TBZ 

RJS 

LOOP  2 

IF  CNTR  NOT=0  GO  TO  LOOP 2 

RTN 

INC 

P 

P 

RETURN  TO  SPEED 

****************************************************** 

*  FLOATING  POINT  MULTIPLY  AND  MULTIPLY  (MPYX)  * 

*  ROUTINES.  THESE  ROUTINES  ARE  TAKEN  FROM  APPENDIX  E  * 

*  OF  THE  HP  MICROPROGRAMMING  21MX  COMPUTERS  OPERATING* 

*  AND  REFERENCE  MANUAL.  THESE  ROUTINES  ALSO  RESIDE  IN* 

*  CONTROL  STORE  ROM,  BUT  IT  IS  NECESSARY  TO  REPRODUCE* 

*  THEM  IN  WCS  TO  AVOID  THE  PROBLEM  OF  LEVELED  SUB-  * 

*  ROUTINE  CALLS  IN  THE  M-SERIES.  FMPY  HAS  BEEN  MODI-  * 
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*  FI ED  SLIGHTLY  TO  HANDLE  A  PARAMETER  ADDRESS  IN  A  * 

*  SCRATCH  REGISTER  RATHER  THAN  POINTED  AT  BY  THE  P  * 

*  REGISTER  (PROGRAM  COUNTER).  ALSO,  INDIRECT  ADDRES-  * 

*  SING  IS  NOT  USED.  MPYX  IS  UNCHANGED.  * 

****************************************************** 


FMPY 


MPYX 


* 

RETURN 


READ 

INC 

M 

S4 

READ  1ST  PARAMETER  WORD 

J  SB 

FLD 

STORE  ARGS  IN  SCRATCH  REGS 

INC 

S9 

S9 

PASS 

L 

S5 

FORM  EXP1+EXP2+1 

ADD 

S9 

S9 

AND  SAVE  IN  S9 

Rl 

PASS 

A 

S10 

FORM  ( WORDl  LOB ITS ) /2  IN  A 

PASS 

S2 

S7 

PASS  WORD 2  HIBITS  INTO  S2 

JSB 

MPYX 

JMP  TO  MPY  SUB  &  RTN  WITH 

PASS 

S5 

B 

HIBITS  IN  B  &  SAVE  IN  S5 

PASS 

S2 

Sll 

PASS  WORDl  HIBITS  INTO  S2 

PASS 

Sll 

A 

LOB ITS  INTO  A.  SAVE  INTO  S 

Rl 

PASS 

A 

S6 

FORM  ( WORD 2  LOBITS)/2  IN  A 

J  SB 

MPYX 

JMP  TO  MPY  SUB  &  RTN  WITH 

PASS 

L 

A 

LOBITS  IN  A  &  PASS  INTO  L 

ADD 

A 

Sll 

ADD  BOTH  LOBITS.  CHK  FOR  C 

JMP 

CNDX 

COUT 

RJS 

*  +  2 

(ELSE  TRUNCATE  DIGITS) 

INC 

B 

B 

IF  COUT,  BUMP  HIBITS 

PASS 

L 

B 

ADD  HIBITS  &  SAVE  IN  Sll 

ADD 

Sll 

S5 

PASS 

A 

S7 

PASS  WORD 2  HIBITS  INTO  A 

JSB 

MPYX 

JMP  TO  MPY  SUB  &  RTN  WITH 

Rl 

PASS 

A 

A 

LOBITS  IN  A.  SAVE  LOBITS/2 

COV 

PASS 

L 

A 

ADD  LOBITS/2  TO  HIBITS  SUM 

ENV 

LI 

ADD 

A 

Sll 

SHIFT  Ll  TO  REORIENT 

JMP 

CNDX 

AL15 

RJS 

*  +  3 

CHECK  FOR  CAR.  1  INTO  OR 

JMP 

CNDX 

OVFL 

*  +  4 

BORROW  FROM  HIBITS  & 

DEC 

B 

B 

ADJUST  ACCORDINGLY 

JSB 

PACK 

JMP 

RTNFMPY  RTN  TO  MAIN  MICROPROGRAM 

INC 

B 

B 

CAN'T  OVERFLOW  FROM  HIBITS 

JSB 

PACK 

JMP 

RTNFMPY  RTN  TO  MAIN  MICROPROGRAM 

COV 

PASS 

SI 

A 

S1C-A (MULTIPLICAND ) .  CLEAR 

ZERO 

B 

CLEAR  B  FOR  MULTIPLY 

PASS 

L 

S2 

LC-S2 (MULTIPLIER) 

RPT 

PASS 

CNTR  B 

CLEAR  COUNTER  &  SET  REPEAT 

MPY 

Rl 

ADD 

B 

B 

MPY  STEP  (X16),  (B , A ) <-*L+ 

PASS 

si 

TEST  MULTIPLICAND 

JMP 

CNDX 

AL15 

RJS 

*  +  2 

JUMP  IF  POSITIVE 

SUB 

B 

B 

UNDO  LAST  MPY  STEP  IF  NEG 

PASS 

S2 

TEST  MULTIPLIER 

JMP 

CNDX 

AL15 

RJS 

RETURN 

JMP  IF  POSITIVE 

PASS 

L 

Si 

LC-MULTIPLICAND 

RTN 

SUB 

B 

B 

BC-MINUS  L  (CORRECTS  NEG 

MULT) 

RTN 

RETURN  TO  CALLING  ROUTINE 
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****************************************************** 

*  FLOATING  POINT  ADD  ROUTINE.  THIS  ROUTINE  IS  ALSO  * 

*  TAKEN  FROM  APPENDIX  E  OF  THE  HP  MICROPROGRAMMING  * 

*  21MX  COMPUTERS  OPERATING  AND  REFERENCE  MANUAL.  IT  * 

*  ALSO  RESIDES  IN  CONTROL  STORE  ROM  BUT  IS  DUPLICATED* 

*  IN  WCS  TO  AVOID  THE  PROBLEM  OF  LEVELED  SUBROUTINE  * 

*  CALLS  IN  THE  M-SERIES.  FADD  HAS  BEEN  MODIFIED  TO  * 

*  EXCLUDE  THE  CODE  FOR  FLOATING  POINT  SUBTRACT  AND  TO* 

*  ALLOW  A  PARAMETER  ADDRESS  IN  A  SCRATCH  REGISTER  * 

*  RATHER  THAN  THE  P  REGISTER.  INDIRECT  ADDRESSING  IS  * 

*  NOT  USED,  SO  THE  CALL  TO  "INDIRECT"  IS  OMITTED.  * 

*  ALSO,  SCRATCH  REGISTER  S2  IS  USED  IN  PLACE  OF  S8  TO* 

*  FREE  S8  FOR  USE  IN  THE  MAIN  PROGRAM.  * 


FADD 


DIFR 


RVRS 


ADD2 


(  H  K  Jl  K  ) 

READ 

(  W  H  H  A  X 

x  h  x  n  x 

INC 

XXX" 

M 

S8 

READ  1ST  PARAMETER  WORD 

JSB 

FLD 

UNPACK  WORDS  INTO  SCR  REGS 

PASS 

B 

S7 

CHECK  FOR  WORD2=0 

JMP 

CNDX 

TBZ 

RJS 

*+2 

IF  NOT,  CONTINUE 

IMM 

LOW 

S5 

%200 

IF  SO,  MAKE  EXP  MOST  NEG 

PASS 

Sll 

CHECK  FOR  WORD1=0 

JMP 

CNDX 

TBZ 

RJS 

*  +  2 

IF  NOT,  CONTINUE 

IMM 

LOW 

S9 

%200 

IF  SO,  MAKE  EXP  MOST  NEG 

PASS 

A 

S6 

PAS 

L 

S5 

FIND  DIFF  IN  EXPS 

CLFL 

SUB 

S2 

S9 

&  STOR^  IN  S2,  FLAG=0 

JMP 

CNDX 

TBZ 

ADD  2 

IF  DIFF=0,  JMP  TO  ADD  STEP 

JMP 

CNDX 

AL15 

RVRS 

IF  NEG,  WORD 2 > WORD 1 

CMPS 

S2 

S2 

FORM  -DIFF 

INC 

S2 

S2 

&  STORE  -DIFF  IN  S2 

JMP 

SWAMPCHK 

PASS 

L 

B 

HOLD  B  IN  L 

PASS 

B 

Sll 

WORDKWORD2,  FILL  IN  B , A 

PASS 

A 

S10 

WITH  S11,S10 

PASL 

Sll 

ALSO  FILL  Sll , SlO , S9 

PASS 

S10 

S6 

WITH  B,S6,S5 

PASS 

S9 

S5 

IMM 

LOW 

L 

%  350 

FORM  -30B8  IN  L 

SUB 

S2 

IF  -DIFF>-31 , RTN  WITH 

LARGER  # 

JMP 

CNDX 

AL15 

OUT 

JMP  TO  RESTORE  A, B 

ARS 

Rl 

PASS 

B 

B 

NOW  START  SHIFT  LOOP 

INC 

S2 

S2 

INC  COUNTER 

JMP 

CNDX 

TBZ 

RJS 

SHIFT 

LOOP  UNTIL  DONE 

COV 

PASS 

L 

sio 

PASS  LOBIT3  INTO  L 

ADD 

A 

A 

ADD  &  CHECK  FOR  COUT 

JMP 

CNDX 

COUT 

RJS 

*  +  3 

IF  NOT,  JUMP 

IMM 

HIGH 

L 

%0 

CLR  L ( 15 )  FOR  OVFL 

ENV 

INC 

B 

B 

IF  SO, INC  HIBITS  &  ENABLE 

OVERFLOW 

CLFL 

PASS 

L 

Sll 

FLAG=0 

ENV 

ADD 

B 

B 

ADD  HIBITS  &  ENABLE  OVFL 

JMP 

CNDX 

OVFL 

RJS 

PKSUB 

IF  NO  OVFL,  RETURN 

JMP 

CNDX 

AL15 

*+2 

OVFL  IMPLIES  SIGN  CHANGE 

STFL 

LWF 

Rl 

PASS 

B 

LWF 

Rl 

PASS 

A 

* 

INC 

S9 

PKSUB 

JSB 

JMP 

OUT 

PASS 

B 

* 

PASS 

A 

JSB 

JMP 

END 


SO  FLAG=U  IF  AL15=0 
B  DO  FULL  WORD  SHIFT 

A  USING  FLAG  REG  TO  INJECT 

SIGN 

S9  BUMP  EXP 

PACK  REPACK  A, B  REGS 

RTNFADD  RTN  TO  MAIN  MICROPROGRAM 
Sll  PASS  MUCH  LARGER  WORD  INTO 

B ,  A 

S10 

PACK 

RTNFADD  RTN  TO  MAIN  MICROPROGRAM 


Appendix  H 


This  appendix  contains  the  listing  for  WCSLD.  This 
program  was  used  to  load  the  LOADS  microprogram  into  WCS  on 
the  AFWAL/FIMN  HP  2lMX.  The  microprogram  object  code  is 
predefined  in  a  buffer,  and  the  buffer  is  output  to  the  WCS 
two  words  at  a  time.  The  program  was  run  under  a  DOS  III 
operating  system  which  had  not  been  configured  for 
microprogramming.  The  program  will  not  run  on  the  AFIT  RTE 
III  system  because  of  the  installed  memory  protect  option. 
The  direct  I/O  instructions  { STF,  STC,  OTA,  OTB,  LIA,  LIB) 
cause  memory  protect  violations. 


ASMB,  L 


NAM  WCSLD, 3 
ENT  WCSLD 

* 

* 

*  WCSLD  WRITES  A  BUFFER  CONTAINING  PRESTOKED  MICROCODE  OUT 

*  TO  WCS.  ONCE  THE  MICROCODE  IS  WRITTEN  TO  WCS,  THE  PROGRAM 

*  READS  THE  MICROCODE  FROM  WCS,  COMPARES  IT  WITH  THE  CODE 

*  THAT  WAS  OUTPUT,  AND  WRITES  THE  INPUT  CODE  INTO  ANOTHER 

*  BUFFER.  IF  A  WORD  DOES  NOT  COMPARE,  AN  ERROR  COUNTER  IS 

*  INCREMENTED.  THE  MICROCODE  IS  WRITTEN  OUT  2  WORDS  AT  A 

*  TIME.  THE  UPPER  8  BITS  OF  THE  A-REG  CONTAINS  THE  WCS 

*  ADDRESS  (0-377B8),  AND  THE  LOWER  8  BITS  OF  THE  A-REG 

*  CONTAINS  THE  UPPER  8  BITS  OF  THE  MICROWORD.  READING  IS  ALSO 

*  DONE  2  WORDS  AT  A  TIME.  THE  WCS  ADDRESS  IS  FIRST  OUTPUT  TO 

*  THE  BOARD,  AND  THEN  THE  MICROWORD  AT  THAT  ADDRESS  IS  READ 

*  IN.  THE  ADDRESS  IS  NOT  READ  BACK  IN. 


* 

* 

sc 

EQU 

10B 

WCS  SELECT  CODE 

WCSLD 

NOP 

STF 

SC 

INIT  DIRECTION  FF 

LDA 

=B-206 

#  OF  MICROWORDS  IN  BUFFER 

★ 

STA 

CNT 

*  WRITE  MICROWORDS  OUT  TO  WCS 


* 

WRLP 

DLD 

. OBFl, I 

WRITE  LOOP 

IOR 

WCSAD 

"OR"  IN  WCS  ADDRESS 

OTA 

SC 

OUTPUT  MICROWORDS 

OTB 

SC 

STC 

SC 

WRITE  PULSE 

ISZ 

.OBFl 

POINT  TO  NEXT  MICROWORD 

ISZ 

.OBFl 

LDA 

WCSAD 

ADA 

=3400 

BUMP  WCS  ADDR  BY  1 

STA 

WCSAD 

ISZ 

CNT 

* 

JMP 

WRLP 

*  NOW 

* 

READ 

THE  MICROCODE  BACK  IN  AND  COMPARE 

CLA 

1ST  WCS  ADDRESS  =  0 

STA 

WCSAD 

LDA 

=B-206 

STA 

CNT 

RDLP 

STF 

SC 

INIT  DIRECTION  FF 

LDA 

WCSAD 

GET  WCS  ADDRESS 

OTA 

SC 

OUTPUT  ADDRESS  TO  WCS 

STF 

SC 

REINIT  FF 

LI  A 

SC 

INPUT  MICROWORD 

LIB 

SC 

CPA 

.OBF2, I 

DO  COMPARES 

JMP 

BCOMP 

ISZ 

ERCNT 

BUMP  ERROR  COUNT 

BCOMP 

IS7. 

.  OBF2 

POINT  TO  2ND  WORD 

c:  < 

. OBF2 , I 

JMP 

STWRD 

ISZ 

ERCNT 

BUMP  ERROR  COUNT 

STWRD 

DST 

.IBF, I 

STORE  MICRO WORDS 

ISZ 

.IBF 

POINT  TO  NEXT  POSITION 

ISZ 

.  IBF 

ISZ 

.  OBF2 

LDA 

WCSAD 

GET  WCS  ADDRESS 

ADA 

=B400 

BUMP  IT  BY  1 

STA 

WCSAD 

ISZ 

CNT 

* 

JMP 

RDLP 

* 

CNT 

OCT 

0 

LOOP  COUNTER 

WCSAD 

OCT 

0 

WCS  ADDRESS 

ERCNT 

OCT 

0 

ERROR  COUNTER 

.  IBF 

DEF 

I  BUFF 

INPUT  BUFFER  ADDRESS 

.  OBFl 

DEF 

OBUFF 

OUTPUT  BUFFER  ADDRESS 

.  OBF2 

DEF 

OBUFF 

ANOTHER  ONE 

*  START  OF 

"LOADS" 

MICROCODE 

OBUFF 

OCT 

321,100130,301,170351,220,074457 

OCT 

000,075717,017,101117,220,074457 

OCT  000,075717,017,101557,220,074457 
OCT  357,145617,017,101357,357,176557 
OCT  000,056461,177,126017,000,056517 
OCT  000,024461,177,126017,357,145657 
OCT  017,167157,220,044457,000,045117 
OCT  017,100557,220,044457,000,045117 
OCT  017,100517,321,102530,321,105370 
OCT  000,056461,177,126017,000,057357 
OCT  000,056461,177,124017,007,157357 
OCT  000,047157,000,047157,007,173657 
OCT  320,001171,000,057357,000,057357 
OCT  007,171617,320,000571, 000,075736 
OCT  220,046457,301,141470,000,061417 
OCT  017,150157,004,161417,017,162544 
OCT  017,155057,301,104530,017,125217 
OCT  017,165057,017,127517,017,152544 
OCT  301,104530,017,126157,004,164557 
OCT  321,003571,000,024517,017,124157 
OCT  004,151517,017,154557,301,104530 
OCT  017,126544,017,126154,244,164542 
OCT  322,004271,325,044371,007,124517 
OCT  301,142530,321,101530,000,024517 
OCT  301,142530, 321, 101530,017,127014 
OCT  001,136517,017,142157,017,124255 
OCT  014,124504,017,140757,322,005131 
OCT  003,024517,017,142757,322,005331 
OCT  017,140157,003,024536,017,136776 
OCT  220,056457,301,141470,017,154517 
OCT  320,005631,346,001217,017,164757 

150 


OCT  320,005771,346,001417,017,152557 
OCT  017,150157,003,061051, 320,047171 
OCT  322,046371,010,043057,000,043057 
OCT  321, 106670,017,124157,017,164517 
OCT  017,162557,015,037517,017,153457 
OCT  017,151417,347,120157,003,042757 
OCT  322,050131,037,124504,000,043057 
OCT  320,007031,017,162154,004,126557 
OCT  321,007431,340,000157,240,024517 
OCT  017,164151,244,124517,325,001031 
OCT  322,047671,017,136750,157,124504 
OCT  157,126544,000,061417,301,142530 
OCT  321,101570,017,164517,017,162557 
OCT  301,142530,321,101570 
*  END  OF  "LOADS"  MICROCODE 
IBUFF  BSS  414B  INPUT  BUFFER 

END  WCSLD 


Appendix  I 

This  appendix  contains  listings  for  CDRVR  and  CALC. 
CDRVR  is  a  special  driver  program,  which  was  written  to 
test  the  laser  materials  modeling  program  routine  CALC  on 
the  AFIT  21MX  computer.  CDRVR  provides  all  the  inputs  to 
CALC  which  would  normally  come  from  a  potentiometer  board 
on  the  AFWAL/MLPJ  computer. 


noon  on  o  noon  on 


6  AUG  82 
PROGRAM  CDRVR 


C 

C  CDRVR  IS  A  TEST  DRIVER  PROGRAM  FOR  THE  SUBROUTINE  CALC,  A 
C  ROUTINE  WHICH  CALCULATES  THE  REAL  AND  IMAGINARY  PARTS  OF 
C  REFRACTIVE  INDEX,  A  CHARACTERISTIC  MEASURE  OF  LASER 
C  MATERIALS.  CDRVR  IS  USED  TO  DRIVE  CALC  ONLY  FOR  THE 
C  PURPOSE  OF  MAKING  TIMING  MEASUREMENTS  ON  CALC.  CALC  IS  ONE 
C  ROUTINE  OF  SEVERAL  USED  IN  A  LASER  MODELING  PROGRAM 
C  DEVELOPED  BY  AFWAL/MLP J . 

C 

COMMON  1X0(150) ,IYO( 150) , B ( 30 ) , G ( 30 ) , IA ( 30 ) , IQ ( 20 ) 

REAL  BB ( 30 ) , F, N, K 
INTEGER  I,JJ 

B ( 30 )  —  AN  ARRAY  CONTAINING  PARAMETERS  NORMALLY  INPUT 
FROM  A  30 -POT  POTENTIOMETER  BOARD 
N  —  THE  REAL  PART  OF  THE  REFRACTIVE  INDEX 
K  —  THE  IMAGINARY  PART  OF  THE  REFRACTIVE  INDEX 
F  —  RADIATION  FREQUENCY 

JJ  —  THE  NUMBER  OF  OSCILLATORS  USED  IN  THE  REFRACTIVE 
INDEX  CALCULATION 

DATA  BB/1. 0,800. 0,1. 0,1. 0,800. 0,1. 0,1. 0,800. 0,1.0, 

*  1.0,800.0,1.0,1.0,800.0,1.0,1.0,800.0,1.0,1.0,800.0, 

*  1.0,1.0,800.0,1.0,0.0,400.0,100.0,2.0,0.5,0.0 
DATA  JJ/8 

COPY  DATA  FROM  DUMMY  BB  ARRAY  TO  B  ARRAY  IN  COMMON  AREA 
DO  50  1=1,30 
B( I)=BB( I) 

50  CONTINUE 

PERFORM  CALCULATIONS  FOR  FREQUENCIES  FROM  1000  TO  200  IN 
STEPS  OF  40  TO  MAKE  APPROXIMATELY  20  CALCULATIONS. 

DO  100  1=1000,200,-40 
F=l. 0*1 

CALL  CALC (B,JJ,F,N,K) 

WRITE(10, 200 ) F, N, K 
200  FORMAT (F20. 9, 2X,F20. 9, 2X,F20. 9) 

100  CONTINUE 
END 


i 


i 


I 


i 

I 

I 


i 


SUBROUTINE  CALC ( B , JJ , F , Cll , C12 ) 

C 

C  SUBROUTINE  CALC  CALCULATES  THE  REAL  AND  IMAGINARY  PARTS  OF 
C  THE  REFRACTIVE  INDEX  OF  A  LASER  MATERIAL  SIMULATED  BY 
C  PARAMETERS  INPUT  FROM  A  POTENTIOMETER  BOARD.  THE 
C  CALCULATION  IS  PERFORMED  BY  EVALUATING  EQUATIONS  FOR 
C  ( N*N  -  K*K )  AND  (2*N*K)  AND  THEN  SOLVING  THESE  TWO 
C  EQUATIONS  SIMULTANEOUSLY  FOR  N  AND  K  (Cll  AND  Cl2 ) . . 

C 

COMMON  1X0(150) ,IYO(150) ,B( 30) 

REAL  01,02,03,04,05,06,07,08,09,010,011,012,013 
INTEGER  F2,J1,J2,J3 
C 

C  C1-C13  —  USED  FOR  INTERIM  RESULTS  IN  EVALUATION  OF  THE 
C  TWO  LONG  EQUATIONS 

C  F2  —  FREQUENCY  (F)  SQUARED 

C  J1-J3  —  INDICES  OF  ARRAY  B  USED  TO  PICK  OUT  THREE 
C  DIFFERENT  PARAMETERS  —  DAMPING  FACTOR,  FREQUENCY 

C  OF  RESONANCE  OF  THE  ITH  OSCILLATOR,  AND  STRENGTH 

C  OF  RESONANCE 

C 

F2=F*F 
C5=0. 0 
C6=0 . 0 

DO  100  J=1,JJ 
J3=J*  3 
J2=J1-1 
J1=J2-1 
Cl=B( Jl)*F 
C2=B ( J2 ) *B ( J  2 ) 

C3=C2-F2 

C4= (B ( J  3 ) *C2 ) / (C3*C3+Cl*Cl) 

C5=C5+C3*C4 
C6=C6+Cl*C4 
100  CONTINUE 
C 

C7=B (27)*B(27)+F2 
C8=B( 26 ) *B( 26 ) 

C  C9=N*N-K*K 

C9=B(28)+C6-B(29)*C8/C7 
C  ClO=2*N*K 

Cl0=C6+B ( 29)*B(27)*C8/(F*C7) 

C  NOW  SOLVE  THESE  2  EQUATIONS  FOR  N  AND  K  (Cl2  AND  Cl3) 
Cll=0. 5*  (-C9+SQRT(C9*C9H-C10*C10)  ) 

Cl2=SQRT (C9+C11 ) 

C13=SQRT ( Cll ) 

RETURN 

END 

END$ 


I 
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Appendix  J 

This  appendix  contains  listings  for  CALC,  ACALC,  and 


MCALC.  The  CALC  in  this  listing  is  the  same  as  in  Appendix 
I  except  that  the  DO  loop  has  been  replaced  by  a  call  to 
ACALC.  ACALC  is  an  assembly  language  program  which 
interfaces  CALC  to  MCALC.  MCALC  is  the  microprogram  which 
performs  the  function  previously  performed  by  the  DO  loop. 
CALC  is  again  driven  by  CDRVR  as  shown  in  Appendix  I. 


*ir 


nonoooonooooo 


SUBROUTINE  CALC ( B , JJ , F , Cll , Cl2 ) 

C  SUBROUTINE  CALC  CALCULATES  THE  REAL  AND  IMAGINARY  PARTS  OF 
C  THE  REFRACTIVE  INDEX  OF  A  LASER  MATERIAL  SIMULATED  BY 
C  PARAMETERS  INPUT  FROM  A  POTENTIOMETER  BOARD.  THE 
C  CALCULATION  IS  PERFORMED  BY  EVALUATING  EQUATIONS  FOR 
C  ( N*N  -  K*K)  AND  (2*N*K)  AND  THEN  SOLVING  THESE  TWO 
C  EQUATIONS  SIMULTANEOUSLY  FOR  N  AND  K  (Cll  AND  C12).. 

COMMON  IXO(150),IYO(15Q),B(30) 

REAL  Cl,C2,C3,C4,C5,C6,C7,C8,C9,Cl0,Cll,Cl2,Cl3,F2 
INTEGER  Jl,J2,J3 

C1-C13  —  USED  FOR  INTERIM  RESULTS  IN  EVALUATION  OF  THE 
TWO  LONG  EQUATIONS 
F2  —  FREQUENCY  (F)  SQUARED 

J1-J3  —  INDICES  OF  ARRAY  B  USED  TO  PICK  OUT  THREE 

DIFFERENT  PARAMETERS  —  DAMPING  FACTOR,  FREQUENCY 
OF  RESONANCE  OF  THE  ITH  OSCILLATOR,  AND  STRENGTH 
OF  RESONANCE 

EVALUATE  THE  TWO  EQUATIONS  OVER  JJ  OSCILLATORS. 

THIS  IS  DONE  BY  MICROPROGRAM  MCALC  WHICH  IS  INVOKED 
BY  THE  ASSEMBLY  LANGUAGE  ROUTINE  ACALC.  RESULTS  ARE 
RETURNED  IN  C5  AND  C6. 

F2=F*F 
C5=0 . 0 
C6=0 . 0 

CALL  ACALC ( JJ , F , F2 , C5 , C6 ) 

C7=B( 27 ) *B( 27 )+F2 
C8=B( 26 ) *B( 26 ) 

C  C9=N*N-K*K 

C9=B( 28 ) +C6-B ( 29 ) *C8/C7 
C  C10=2*N*K 

C10=C6+B ( 29 ) *B ( 27 ) *C8/ ( F*C7 ) 

C  NOW  SOLVE  THESE  2  EQUATIONS  FOR  N  AND  K  (C12  AND  C13) 
C11=0.5*(-C9+SQRT(C9*C9+C10*C10) ) 

C12=SQRT(C9+Cll) 

C13=SQRT (Cll ) 

RETURN 


ASMB,L 


NAM 

ACALC , 7 

EXT 

.  ENTR 

ENT 

ACALC 

COM 

1X0(150) 

, IYO ( 150 ) , B ( 60 ) 

.  JJ 

BSS 

1 

.  .F 

BSS 

1 

.  . F2 

BSS 

1 

.  .C5 

BSS 

1 

.  .  C6 

BSS 

1 

ACALC 

NOP 

JSB 

.ENTR 

DEF 

.  JJ 

OLD 

.  .F 

GET  F  AND  F2  ADDRESSES 

DST 

•  F 

COPY  INTO  .F  AND  .F2 

OLD 

.  .  C5 

GET  C5  AND  C6  ADDRESSES 

DST 

.  C5 

COPY  INTO  . C5  AND  .C6 

LDX 

.B 

PUT  B  ARRAY  ADDRESS  INTO  X-REG 

LDY 

.  JJ,I 

PUT  JJ  INTO  Y  FOR  LOOP  COUNT  IN  MCALC 

LDA 

.TMPl 

PUT  TMPl  ADDRESS  INTO  A-REG 

LDA 

.  TMP2 

PUT  TMP2  ADDRESS  INTO  B-REG 

MCALl 

OCT 

105620 

INVOKE  MCALC  AT  1ST  ENTRY  POINT 

.F 

BSS 

1 

ADDRESS  OF  PARAMETER  F 

.  F2 

BSS 

1 

ADDRESS  OF  PARAMETER  F2 

FDIV 

OCT 

105060 

INVOKE  FLT  PNT  DIVIDE  ROM  ROUTINE 

.  ClC3 

DEF 

C1C3 

ARGUMENT  FOR  FDV 

MCAL2 

OCT 

105621 

INVOKE  MCALC  AT  2ND  ENTRY  POINT 

.  C5 

BSS 

1 

OUTPUT  PARAMETERS  OF  MCALC 

.06 

BSS 

1 

JMP 

ACALC , I 

RETURN  TO  CALC 

.B 

DEF 

B 

ADDRESS  OF  B  ARRAY 

.  TMPl 

DEF 

TMPl 

ADDRESS  OF  TMPl 

.  TMP2 

DEF 

TMP2 

ADDRESS  OF  TMP2 

TMPl 

BSS 

2 

TMP1-TMP3  ARE  WORKING 

TMP2 

BSS 

2 

LOCATIONS  FOR  MCALC 

TMP3 

BSS 

2 

TMP3  MUST  FOLLOW  TMP2 

C1C3 

BSS 

2 

HOLDS  C1*C1+C3*C3 

END 

ACALC 

MICMX, L, R  21MX 

$CODE=%MCALC: : 20, REPLACE  OBJECT  TO  DISK 

ORG  6000B 

****************************************************** 
*  * 


★ 

* 

MCALC  MICROPROGRAM 

* 

* 

* 

THIS 

MICROPROGRAM  IS  A  SUBSTITUTE  FOR 

THE  FOLLOWING* 

★ 

LOOP 

IN  THE  FORTRAN  SUBROUTINE  CALC: 

* 

* 

DO  300  J=1,JJ 

* 

* 

J3=J*3 

* 

* 

J2=J1-1 

it 

* 

Jl=J2-l 

it 

★ 

Cl=B( Jl)*F 

it 

* 

C2=B( J2)*B( J2) 

it 

* 

C3=C2-F2 

it 

* 

C4=(B( J3)*C2)/(C3*C3+C1*C1) 

it 

* 

C5=C5+C3*C4 

it 

* 

C6=C6+C1*C4 

it 

* 

300 

CONTINUE 

it 

★ 

it 

*  MCALC  IS  INVOKED  BY  FIRST  CALLING  AN  ASSEMBLY  * 

*  LANGUAGE  ROUTINE  CALLED  ACALC  FROM  CALC  AT  THE  * 

*  POINT  WHERE  THE  ABOVE  LOOP  RESIDED.  ACALC  THEN  * 

*  INVOKES  THE  MICROPROGRAM  WITH  THE  FOLLOWING  * 

*  INSTRUCTIONS:  * 

*  * 

*  LDX  .B  PUT  B  ARRAY  ADDR  INTO  X-REG  * 

*  LDY  ,JJ,I  PUT  JJ  INTO  Y  FOR  LOOP  COUNTER  * 

*  LDA  . TMPl  PUT  TMP 1 , TMP  2 , TMP  3  * 

*  LDA  . TMP 2  TMP 1 , TMP 2 , TMP 3  ARE  DEFINED  AS  * 

*  "BSS  2".  TMP 3  MUST  IMMEDIATELY  * 

*  FOLLOW  TMP 2  TO  PASS  ITS  ADDRESS.  * 

*  MCALl  OCT  105620  INVOKE  MCALC  AT  1ST  ENTRY  POINT  * 

*  .F  BSS  1  ADDR  OF  F  * 

*  . F2  BSS  2  ADDR  OF  F2  (F2=F*F)  * 

*  FDIV  OCT  105060  INVOKE  FLT  PNT  DIVIDE  ROM  ROUTINE* 

*  .C1C3  DEF  C1C3  ARGUMENT  FOR  FLT  PNT  DIVIDE  * 

*  MCAL2  OCT  105621  INVOKE  MCALC  AT  2ND  ENTRY  POINT  * 

*  . C5  BSS  1  OUTPUT  PARAMETERS  OF  MCALC  * 

*  . C6  BSS  1  * 

*  * 

*  MCALC  IS  INVOKED  TWICE  AT  TWO  DIFFERENT  ENTRY  * 

*  POINTS.  THE  REASON  FOR  THIS  IS  THAT  A  FLOATING  * 

*  POINT  DIVIDE  MUST  BE  PERFORMED  IN  THE  MIDDLE  OF  * 

*  MCALC,  BUT  THE  DIVIDE  ROUTINE  WILL  NOT  FIT  IN  WCS,  * 

*  SO  THE  ROM  ROUTINE  IS  USED.  THIS  REQUIRES  A  RETURN  * 

*  TO  ACALC  TO  INVOKE  THE  ROM  ROUTINE.  MCALC  IS  THEN  * 

*  REENTERED  TO  COMPLETE  ITS  OPERATION.  * 

****************************************************** 


MPYX 

EQU 

%0246 

ROM 

FLT 

PNT 

MPYX 

ROUTINE 

FLD 

EQU 

%7031 

ROM 

FLT 

PNT 

LOAD 

ROUTINE 

PACK 

EQU 

%7052 
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ROM 

FLT 

PNT 

PACK 

ROUTINE 

STARTl 
START 2 


JMP 

JMP 


MCALCl  GO  TO  1ST  ENTRY  POINT 
MCALC2  GO  TO  2ND  ENTRY  POINT 


****************************************************** 

*  THE  FOLLOWING  RETURN  TABLE  IS  USED  TO  JUMP  BACK  TO  * 

*  THE  MAIN  MICROPROGRAM  FROM  SUBROUTINES  FMPY , FADD  OR* 

*  FSUB.  NORMAL  RETURNS  CANNOT  BE  MADE  BECAUSE  THE  * 

*  RETURN  ADDRESS  IS  LOST  WHEN  FMPY, FADD  OR  FSUB  CALL  * 

*  OTHER  ROUTINES.  THIS  JUMP  TABLE  IS  USED  AS  FOLLOWS:* 

*  RTNTABLE  IS  LOCATED  AT  6002  AND  CONTAINS  A  JUMP  TO  * 

*  THE  1ST  LOCATION  FOLLOWING  THE  1ST  CALL  TO  FMPY.  * 

*  BEFORE  FMPY  IS  CALLED,  THE  IR-REG  IS  LOADED  WITH  * 

*  VALUE  2  IN  BITS  0-3.  THE  RETURN  FROM  FMPY  IS  VIA  A  * 

*  "JMP  J74 "  USING  BITS  4-7  OF  THE  IR,  SO  THE  RETURN  * 

*  INDEX  FOR  A  FMPY  AND  A  SUBSEQUENT  FADD  OR  FSUB  CAN  * 

*  BE  LOADED  INTO  THE  IR  AT  THE  SAME  TIME.  * 

****************************************************** 


RTNTABLE 

JMP 

RTNPNTl 

TABLE  OF  JUMPS  TO 

JMP 

RTNPNT2 

RETURN  POINTS  FROM 

JMP 

RTNPNT3 

SUBROUTINE  CALLS. 

JMP 

RTNPNT4 

BEFORE  JUMPING  TO  A 

JMP 

RTNPNT5 

SUBROUTINE  THE  LOWER  4 

JMP 

RTNPNT6 

BITS  OF  THE  RTNTABLE 

JMP 

RTNPNT7 

JMP  ENTRY  ARE  LOADED 

JMP 

RTNPNT8 

INTO  THE  IR.  THE 

JMP 

RTNPNT9 

SUBROUTINE  DOES  A  "JMP 

JMP 

RTNPNT10 

J30  RTNTABLE"  TO 

JMP 

RTNPNTll 

RETURN  TO  A  CALLER. 

THE  "J30"  REPLACES  THE 
LOWER  4  BITS  OF  THE 

JMP  ADDR  WITH  THE  4  IR 
BITS 

DEBUG 

JMP 

DEBUG 

DUMMY  ENTRY  FOR  DEBUG 

******** 

********************************************** 

*  SET  UP 

CALLING  PARAMETERS. 

* 

*  SCRATCH  REGISTERS:  JJ  — >  Y-REG  * 

*  .B  — >  X-REG  * 

*  . TMPl  — >  S4  * 

*  . TMP2  — >  S8  * 

*  .TMP3  -->  S12  * 

****************************************************** 


MCALCl  PASS  S  P 

PASS  S4  A 
PASS  S8  B 
INC  S12  S8 
INC  S12  S12 

******************************** 


SAVE  P  IN  S 

USE  S4  AS  POINTER  TO  TMPl 

USE  S8  AS  POINTER  TO  TMP2 

USE  S12  AS  POINTER  TO  TMP3 
NOTE  TMP3  IS  AT  TMP2+2 
********************* 


*  CALCULATE  Cl.  Cl=GAMMAl ( J ) *F  * 

*  NOTE  THAT  GAMMAl(J),  NU(J),  AND  RHO(J)  ARE  ELEMENTS* 

*  OF  THE  B  ARRAY,  AND  ARE  ARRANGED  IN  THE  ARRAY  IN  * 

*  THAT  ORDER.  I.E.,  B ( 1 ) =GAMMAl ( 1 ) ,  B(2)=NU(1),  B(3)=* 

*  RHO(l),  B( 4 )=GAMMAl ( 2 ) ,  B(5)=NU(2),  B(6)=RHO(2)  ...* 

*  B( 22 ) =GAMMAl ( 8 ) ,  B(23)=NU(8),  AND  B ( 24 )=RHO ( 8 ) .  * 
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****************************************************** 


LOOP 

READ 

INC 

M 

X 

READ  GAMMAl  ELEMENT  PROM  B 

INC 

X 

X 

POINT  TO  2ND  GAMMAl  WORD 

PASS 

A 

TAB 

PUT  1ST  GAMMAl  WORD  INTO  A 

READ 

INC 

M 

X 

READ  2ND  GAMMAl  WORD 

INC 

X 

X 

POINT  TO  NU  ELEMENT  OF  B 

PASS 

B 

TAB 

PUT  2ND  GAMMAl  WORD  INTO  B 

READ 

INC 

PNM 

P 

READ  F  ADDR.  POINT  TO  .F2 

PASS 

S3 

TAB 

F  ADDR  INTO  S3  FOR  MPY 

IMM 

CMLO 

Si 

%374 

LO  4  MAP  TO  "JMP  RTNPNTl" 

PASS 

IR 

SI 

JMP 

FMPY 

GAMMAl (J)*F=C1  RETURN  IN  A 

RTNPNTl 

MPCK 

INC 

M 

34 

POINT  M  AT  TMPl 

WRTE 

PASS 

TAB 

A 

WRITE  1ST  Cl  WORD  TO  TMPl 

INC 

S3 

S4 

POINT  S3  TO  2ND  TMPl  WORD 

MPCK 

INC 

M 

S3 

NOW,  SO  DOES  M 

WRTE 

PASS 

TAB 

B 

WRITE  2ND  Cl  WORD  TO  TMP2 

****************************************************** 


*  CALCULATE  C2.  C2=NU < J ) *NU ( J )  * 

****************************************************** 


PASS 

S3 

X 

NU ( J )  ADDR  INTO  S3 

READ 

INC 

M 

X 

READ  NU  ELEMENT  FROM  B 

INC 

X 

X 

POINT  TO  2ND  WORD  OF  NU 

PASS 

A 

TAB 

PUT  1ST  WORD  OF  NU  INTO  A 

READ 

INC 

M 

X 

READ  2ND  WORD  OF  NU 

INC 

X 

X 

POINT  TO  RHO  ELEMENT  OF  B 

PASS 

B 

TAB 

PUT  2ND  WORD  OF  NU  INTO  B 

IMM 

CMLO 

Si 

%253 

LO  4  MAP  TO  "JMP  RTNPNT 2" 

PASS 

IR 

Si 

HI  4  MAP  TO  "JMP  RTNPNT3" 

JMP 

FMPY 

NU  ( J )  *  NU  ( J )  RETURN!,  IN  AB 

RTNPNT2  MPCK 

INC 

M 

S8 

POINT  M  AT  TMP2 

WRTE 

PASS 

TAB 

A 

WRITE  1ST  C2  WORD  TO  TMP2 

INC 

S3 

S8 

POINT  S3  TO  2ND  TMP2  WORD 

MPCK 

INC 

M 

S3 

AND  M  ALSO 

WRTE 

PASS 

TAB 

B 

WRITE  2ND  C2  WORD 

****************************************************** 

*  CALCULATE  C3.  C2 

l=C2- 

F2 

* 

****************************************************** 

READ 

INC 

PNM 

P 

READ  . F2.  POINT  TO  FDV  INS 

PASS 

S3 

TAB 

PUT  F2  ADDR  (.F2)  INTO  S3 

JMP 

FADDSUB  C2-F2  RETURNS  IN  A/B 

RTNPNT3  MPCK 

INC 

M 

S12 

POINT  M  AT  TMP3 

WRTE 

PASS 

TAB 

A 

WRITE  1ST  C3  WORD  TO  TMP 3 

INC 

S3 

S12 

S3  POINTS  TO  TMP 3+1 

MPCK 

INC 

M 

S3 

AND  SO  DOES  M 

WRTE 

PASS 

TAB 

B 

WRITE  2ND  C3  WORD  TO  TMP 3 

****************************************************** 

*  CALCULATE  C4.  C4= (RHO ( J ) *C2)/(C3*C3+Cl*Cl)  * 

****************************************************** 

READ 

INC 

M 

S8 

READ  1ST  WORD  OF  C2  ( TMP 2 ) 

INC 

S3 

S8 

S3  POINTS  TO  2ND  WORD 

PASS 

A 

TAB 

1ST  WORD  OF  C2  INTO  ft-REG 

READ 

INC 

M 

S3 

READ  2ND  WORD 

PASS 

S3 

X 

PASS  S3  AT  RHO  ELEMENT  OF 
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RTNPNT4 


RTNPNT5 


RTNPNT6 


RTNPNT7 


INC 

X 

X 

INC  X  BY  2  TO  POINT  AT 

INC 

X 

X 

NEXT  GAMMA 

PASS 

B 

TAB 

2ND  WORD  OF  C2  INTO  B-REG 

IMM 

CMLO 

SI 

%371 

LO  4  MAP  TO  "JMP  RTNPNT4 " 

PASS 

IR 

SI 

JMP 

FMPY 

RHO( J) *C2  RETURNS  IN  A/B 

MPCK 

INC 

M 

S8 

POINT  M  AT  TMP2 

WRTE 

PASS 

TAB 

A 

WRITE  1ST  WORD  RHO(J)*C2 

INC 

S3 

S8 

POINT  S3  TO  2ND  WORD  TMP2 

MPCK 

INC 

M 

S3 

AND  M  ALSO 

WRTE 

PASS 

TAB 

B 

WRITE  2ND  WORD  RHO(J)*C2 

READ 

INC 

M 

S12 

READ  1ST  WORD  OF  C3 

INC 

S3 

S12 

POINT  S3  AT  2ND  WORD 

PASS 

A 

TAB 

PUT  1ST  WORD  INTO  A-REG 

READ 

INC 

M 

S3 

READ  IN  2ND  WORD  OF  C3 

PASS 

S3 

S12 

POINT  S3  BACK  AT  1ST  WORD 

PASS 

B 

TAB 

PUT  2ND  WORD  INTO  B-REG 

IMM 

CMLO 

SI 

%370 

LO  4  MAP  TO  "JMP  RTNPNT5 " 

PASS 

IR 

Si 

JMP 

FMPY 

C3*C3  RETURNS  IN  A/B 

INC 

S3 

P 

POINT  S3  AT  . ClC3  ADDRESS 

READ 

INC 

M 

S3 

READ  ClC3  ADDRESS 

PASS 

S3 

TAB 

AND  PUT  INTO  S3 

MPCK 

INC 

M 

S3 

POINT  M  AT  C1C3 

WRTE 

PASS 

TAB 

A 

WRITE  1ST  WORD  OF  C3*C3 

INC 

S3 

S3 

POINT  S3  AT  2ND  WORD  ClC3 

MPCK 

INC 

M 

S3 

AND  M  ALSO 

WRTE 

PASS 

TAB 

8 

WRITE  2ND  WORD  OF  C3*C3 

READ 

INC 

M 

S4 

READ  IN  1ST  WORD  OF  Cl 

INC 

S3 

S4 

POINT  S3  AT  2ND  WORD 

PASS 

A 

TAB 

PUT  1ST  WORD  INTO  A-REG 

READ 

INC 

M 

S3 

READ  IN  2ND  WORD  OF  Cl 

PASS 

S3 

S4 

POINT  S3  BACK  AT  1ST  WORD 

PASS 

B 

TAB 

PUT  2ND  WORD  INTO  B-REG 

IMM 

CMLO 

Si 

%147 

LO  4  MAP  TO  "JMP  RTNPNT6 " 

PASS 

IR 

SI 

HI  4  MAP  TO  "JMP  RTNPNT7" 

JMP 

FMPY 

Cl*Cl  RETURNS  IN  A/B 

INC 

S3 

P 

POINT  S3  AT  . C1C3  ADDRESS 

READ 

INC 

M 

S3 

READ  ClC3  ADDRESS 

STFL 

PASS 

S3 

TAB 

INTO  S3.  STFL  FOR  NEXT  ADD 

JMP 

FADDSUB  Cl*Cl+C3*C3  RETURNS  IN  AB 

INC 

S3 

P 

POINT  S3  AT  . ClC3  ADDRESS 

READ 

INC 

M 

S3 

READ  ClC3  ADDRESS 

PASS 

S3 

TAB 

AND  PUT  INTO  S3 

MPCK 

INC 

M 

S3 

AND  INTO  M 

WRTE 

PASS 

TAB 

A 

WRITE  1ST  WORD  Cl*Cl+C3*C3 

INC 

S3 

S3 

POINT  S3  AT  WORD  2  OF  ClC3 

MPCK 

INC 

M 

S3 

AND  M  ALSO 

WRTE 

PASS 

TAB 

B 

WRITE  2ND  WORD  Cl*Cl+C3*C3 

READ 

INC 

M 

S8 

READ  1ST  WORD  RHO(J)*C2 

INC 

S3 

S8 

POINT  AT  2ND  WORD 

PASS 

A 

TAB 

PUT  1ST  WORD  INTO  A-REG 

READ 

INC 

M 

S3 

READ  2ND  WORD 

RTND 

PASS 

B 

TAB 

PUT  2ND  WORD  INTO  B 
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****************************************************** 


» 


N 


h 


*  AT  THIS  POINT  EVERYTHING  IS  SET  UP  FOR  THE  DIVIDE  * 

*  OF  RHO( J)*C2  BY  Cl*Cl+C3*C3.  RETURN  TO  THE  ASSEMBLY* 

*  LANGUAGE  ROUTINE  TO  INVOKE  THE  FLOATING  POINT  * 

*  DIVIDE  ROUTINE.  RETURN  TO  MICROCODE  AT  MCALC 2  WITH  * 

*  THE  RESULT  IN  A/B  REGS  AND  OTHER  REGS  INTACT.  * 

****************************************************** 


MCALC 2  MPCK  INC  M  S8 

WRTE  PASS  TAB  A 

INC  S3  S8 
MPCK  INC  M  S3 
WRTE  PASS  TAB  B 

******************************** 


POINT  M  AT  1ST  WORD  TMP2 
1ST  WORD  OF  C4  INTO  TMP2 
POINT  S3  AT  2ND  TMP2  WORD 
AND  M  ALSO 

2ND  WORD  OF  C4  INTO  TMP 2 

********************* 


*  CALCULATE  C5.  C5=C5+C3*C4  * 

****************************************************** 


PASS 

S3 

S12 

ADDRESS  OF  C3  INTO  S3 

IMM 

CMLO 

SI 

%1Q5 

LO  4  MAP  TO  "JMP  RTNPNT8 " 

PASS 

IR 

Si 

HI  4  MAP  TO  "JMP  RTNPNT9" 

JMP 

FMPY 

C3*C4  RETURNS  IN  A/B 

RTNPNT8 

MPCK 

INC 

M 

P 

READ  C5  ADDRESS 

STFL 

PASS 

S3 

TAB 

INTO  S3.  STFL  FOR  NEXT  ADD 

JMP 

FADDSUB  C5+C3*C4  RETURNS  IN  A/B 

RTNPNT9 

READ 

INC 

PNM 

P 

GET  C5  ADDR.  POINT  C6  ADDR 

PASS 

S3 

TAB 

AND  PUT  INTO  S3 

MPCK 

INC 

M 

S3 

C5  ADDRESS  INTO  M 

WRTE 

PASS 

TAB 

A 

1ST  WORD  OF  C5  STORED 

INC 

S3 

S3 

POINT  TO  2ND  WORD 

MPCK 

INC 

M 

S3 

AND  M  ALSO 

WRTE 

PASS 

TAB 

B 

2ND  WORD  OF  C5  STORED 

*  CALCULATE  C6 .  C6=C6+Cl*C4  * 

****************************************************** 


READ 

INC 

M 

S4 

READ  IN  1ST  WORD  OF  Cl 

INC 

S3 

S4 

POINT  S3  AT  2ND  WORD  OF  Cl 

PASS 

A 

TAB 

1ST  WORD  OF  Cl  INTO  A-REG 

READ 

INC 

M 

S3 

READ  IN  2ND  WORD  OF  Cl 

PASS 

S3 

S8 

POINT  S3  AT  C4 

PASS 

B 

TAB 

2ND  WORD  OF  Cl  INTO  B-REG 

IMM 

CMLO 

SI 

%043 

LO  4  MAP  TO  "JMP  RTNPNTlO " 

PASS 

IR 

SI 

HI  4  MAP  TO  "JMP  RTNPNTll” 

JMP 

FMPY 

Cl*C4  RETURNS  IN  A/B 

RTNPNT10 

READ 

INC 

M 

P 

READ  C6  ADDRESS 

STFL 

PASS 

S3 

TAB 

INTO  S3.  STFL  FOR  NEXT  ADD 

JMP 

FADDSUB  C6+C1*C4  RETURNS  IN  A/B 

RTNPNTll 

READ 

MPCK 

INC 

PNM 

P 

GET  C6  ADR.  POINT  TO  NEXT 

PASS 

S3 

TAB 

PUT  C6  ADDRESS  INTO  S3 

INC 

M 

S3 

AND  INTO  M 

WRTE 

PASS 

TAB 

A 

1ST  WORD  OF  C6  STORED 

INC 

S3 

S3 

POINT  TO  2ND  WORD 

MPCK 

INC 

M 

S3 

M  ALSO 

WRTE 

PASS 

TAB 

B 

2ND  WORD  OF  C6  STORED 

DEC 

Y 

Y 

DECREMENT  LOOP  COUNT 

JMP 

CNDX 

TBZ 

RTNMAC 

IF  DONE,  RETURN 

PASS 

P 

S 

POINT  P  AT  .F  &  DO  AGAIN 
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JMP 


LOOP 


RTNMAC  RTN  RETURN  TO  ACALC 

****************************************************** 

*  FLOATING  POINT  MULTIPLY  ROUTINE.  THIS  ROUTINE  IS  * 

*  TAKEN  FROM  APPENDIX  E  OF  THE  HP  MICROPROGRAMMING  * 

*  21MX  COMPUTERS  OPERATING  AND  REFERENCE  MANUAL.  THIS* 

*  ROUTINE  ALSO  RESIDES  IN  CONTROL  STORE  ROM,  BUT  IT  * 

*  IS  NECESSARY  TO  REPRODUCE  IT  IN  WCS  TO  AVOID  THE  * 

*  PROBLEM  OF  LEVELED  SUBROUTINE  CALLS  IN  THE  M-SERIES* 

*  COMPUTER.  FMPY  HAS  BEEN  MODIFIED  SLIGHTLY  TO  HANDLE* 

*  THE  ARGUMENT  ADDRESS  IN  REGISTER  S3  RATHER  THAN  P.  * 

*  THE  RETURN  TO  THE  CALLING  ROUTINE  HAS  BEEN  MODIFIED* 

*  TO  A  "JMP  J30  RTNTABLE"  AS  DISCUSSED  AT  "RTNTABLE" . * 

*  ALSO,  INDIRECT  ADDRESSING  IS  NOT  SUPPORTED.  * 

****************************************************** 


FMPY  READ 

INC 

M 

S3 

READ  1ST  PARAMETER  WORD 

JSB 

FLD 

STORE  ARGS  IN  SCRATCH  REGS 

INC 

S9 

S9 

PASS 

L 

S5 

FORM  EXP1+EXP2+1 

ADD 

S9 

S9 

AND  SAVE  IN  S9 

Rl 

PASS 

A 

S10 

FORM  ( WORDl  LOBITS ) /2  IN  A 

PASS 

S2 

S7 

PASS  WORD 2  HIBITS  INTO  S2 

JSB 

MPYX 

JMP  TO  MPY  SUB  &  RTN  WITH 

PASS 

S5 

B 

HIBITS  IN  B  &  SAVE  IN  S5 

PASS 

S2 

Sll 

PASS  WORDl  HIBITS  INTO  S2 

PASS 

Sll 

A 

LOBITS  INTO  A.  SAVE  INTO  S 

Rl 

PASS 

A 

S6 

FORM  (WORD 2  LOBITS )/2  IN  A 

JSB 

MPYX 

JMP  TO  MPY  SUB  &  RTN  WITH 

PASS 

L 

A 

LOBITS  IN  A  &  PASS  INTO  L 

ADD 

A 

Sll 

ADD  BOTH  LOBITS.  CHK  FOR  C 

JMP 

CNDX 

COUT 

RJS 

*  +  2 

(ELSE  TRUNCATE  DIGITS) 

INC 

B 

B 

IF  COUT,  BUMP  HIBITS 

PASS 

L 

B 

ADD  HIBITS  &  SAVE  IN  Sll 

ADD 

Sll 

S5 

PASS 

A 

S7 

PASS  WORD 2  HIBITS  INTO  A 

JSB 

MPYX 

JMP  TO  MPY  SUB  &  RTN  WITH 

Rl 

PASS 

A 

A 

LOBITS  IN  A.  SAVE  LOBITS/2 

COV 

PASS 

L 

A 

ADD  LOBITS/2  TO  HIBITS  SUM 

ENV 

LI 

ADD 

A 

Sll 

SHIFT  Ll  TO  REORIENT 

JMP 

CNDX 

AL15 

RJS 

*  +  3 

CHECK  FOR  CARRY  INTO  OR 

JMP 

CNDX 

OVFL 

*+4 

BORROW  FROM  HIBITS  & 

DEC 

B 

B 

ADJUST  ACCORDINGLY 

JSB 

PACK 

JMP 

J30 

RTNTABLE  RTN  TO  MAIN  MICROPROGRAM 

INC 

B 

B 

CAN'T  OVERFLOW  FROM  HIBITS 

JSB 

PACK 

JMP 

J30 

RTNTABLE  RTN  TO  MAIN  MICROPROGRAM 

****************************************************** 

*  FLOATING  POINT  ADD  SUBTRACT  ROUTINE.  THIS  ROUTINE  * 

*  IS  TAKEN  FROM  APPENDIX  E  OF  THE  HP  MICROPROGRAMMING* 

*  21MX  COMPUTERS  OPERATING  AND  REFERENCE  MANUAL.  IT  * 

*  ALSO  RESIDES  IN  CONTROL  STORE  ROM  BUT  IS  DUPLICATED* 

*  IN  WCS  TO  AVOID  THE  PROBLEM  OF  LEVELED  SUBROUTINE  * 
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*  CALLS  IN  THE  M-SERIES.  FADD  HAS  BEEN  MODIFIED  TO  * 

*  ALLOW  A  PARAMETER  ADDRESS  IN  SCRATCH  REGISTER  S3  * 

*  RATHER  THAN  THE  P  REGISTER.  INDIRECT  ADDRESSING  IS  * 

*  NOT  USED,  SO  THE  CALL  TO  "INDIRECT"  IS  OMITTED.  * 

*  ALSO,  SCRATCH  REGISTER  S2  IS  USED  IN  PLACE  OF  S8  TO* 

*  FREE  S8  FOR  USE  IN  THE  MAIN  PROGRAM.  THE  RETURN  TO  * 

*  THE  CALLING  ROUTINE  HAS  ALSO  BEEN  MODIFIED  TO  A  * 

*  "JMP  J74  RTNTABLE"  AS  DISCUSSED  AT  "RTNTABLE " .  * 

****************************************************** 


FADD 

READ 

INC 

M 

S3 

READ  1ST  PARAMETER  WORD 

JSB 

FLD 

UNPACK  WORDS  INTO  SCR  REGS 

PASS 

B 

S7 

CHECK  FOR  WORD 2=0 

JMP 

CNDX 

TBZ 

RJS 

*  +  2 

IF  NOT,  CONTINUE 

IMM 

LOW 

S5 

%200 

IF  SO,  MAKE  EXP  MOST  NEG 

PASS 

Sll 

CHECK  FOR  WORD 1=0 

JMP 

CNDX 

TBZ 

RJS 

*  +  2 

IF  NOT,  CONTINUE 

IMM 

LOW 

S9 

%200 

IF  SO,  MAKE  EXP  MOST  NEG 

JMP 

CNDX 

FLAG 

DIFR 

IF  DOING  ADD, SKIP  AHEAD 

CMPS 

B 

B 

FORM  2-COMP  OF  HIBITS  IN  B 

CMPS 

S6 

S6 

FORM  2-COMP  OF  LOB ITS 

INC 

S6 

S6 

OF  WORD 2 

JMP 

CNDX 

COUT 

RJS 

DIFR 

IF  COUT  OCCURS 

INC 

B 

B 

BUMP  HIBITS 

JMP 

CNDX 

AL15 

RJS 

DIFR 

CHECK  SIGN  IF  POS,JUMP 

Ll 

PASS 

B 

IF  NEG, CHECK  FOR  MOST 

JMP 

CNDX 

TBZ 

RJS 

DIFR 

NEG  #  (100.  .  .  ) 

Rl 

PASS 

B 

B 

IF  SO,  SHIFT  BACK  (010...) 

INC 

S5 

S5 

&  BUMP  EXP 

DIFR 

PASS 

A 

S6 

PAS 

L 

S5 

FIND  DIFF  IN  EXPS 

CLFL 

SUB 

S2 

S9 

&  STORE  IN  S2,  FLAG=0 

JMP 

CNDX 

TBZ 

ADD  2 

IF  DIFF=0,  JMP  TO  ADD  STEP 

JMP 

CNDX 

AL15 

RVRS 

IF  NEG,  WORD 2 > WORD 1 

CMPS 

S2 

S2 

FORM  -DIFF 

INC 

S2 

S2 

&  STORE  -DIFF  IN  S2 

JMP 

SWAMPCHK 

RVRS 

PASS 

L 

B 

HOLD  B  IN  L 

PASS 

B 

Sll 

WORDl<WORD2,  FILL  IN  B , A 

PASS 

A 

sio 

WITH  S11,S10 

PASL 

Sll 

ALSO  FILL  Sll, S10, S9 

PASS 

S10 

S6 

WITH  B,S6,S5 

PASS 

S9 

S5 

SWAMPCHK 

IMM 

LOW 

L 

%350 

FORM  -30B8  IN  L 

SUB 

S2 

IF  -DIFF>-31 , RTN  WITH 

* 

LARGER  # 

JMP 

CNDX 

AL15 

OUT 

JMP  TO  RESTORE  A, B 

SHIFT 

ARS 

Rl 

PASS 

B 

B 

NOW  START  SHIFT  LOOP 

INC 

S2 

S2 

INC  COUNTER 

JMP 

CNDX 

TBZ 

RJS 

SHIFT 

LOOP  UNTIL  DONE 

ADD2 

COV 

PASS 

L 

SIO 

PASS  LOBITS  INTO  L 

ADD 

A 

A 

ADD  &  CHECK  FOR  COUT 

JMP 

CNDX 

COUT 

RJS 

*+3 

IF  NOT,  JUMP 

IMM 

HIGH 

L 

%o 

CLR  L ( 15 )  FOR  OVFL 
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ENV 

INC 

B 

B 

IF  SO, INC  HIBITS  &  ENABLE 

* 

OVERFLOW 

CLFL 

PASS 

L 

Sll 

FLAG=0 

ENV 

ADD 

B 

B 

ADD  HIBITS  &  ENABLE  OVFL 

JMP 

CNDX 

OVFL 

RJS 

PKSUB 

IF  NO  OVFL,  RETURN 

JMP 

CNDX 

AL15 

*+2 

OVFL  IMPLIES  SIGN  CHANGE 

STFL 

SO  FLAG=U  IF  AL15=Q 

LWF 

Rl 

PASS 

B 

B 

DO  FULL  WORD  SHIFT 

LWF 

Rl 

PASS 

A 

A 

USING  FLAG  REG  TO  INJECT 

* 

SIGN 

INC 

S9 

S9 

BUMP  EXP 

PKSUB 

JSB 

PACK 

REPACK  A, B  REGS 

JMP 

J74 

RTNTABLE  RTN  TO  MAIN  MICROPROGRAM 

OUT 

PASS 

B 

Sll 

PASS  MUCH  LARGER  WORD  INTO 

* 

B ,  A 

PASS 

A 

S10 

JSB 

PACK 

JMP 

J74 

RTNTABLE  RTN  TO  MAIN  MICROPROGRAM 

END 
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